DOCUMENT RESUME 



ED 327 567 



TM 015 998 



TITLE 



INSTITUTION 



PUB DATE 
NOTE 

AVAILABLE 



PUB TYPE 



FROM 



Oversight Hearing on Testing/Assessment/Evaluation To 
Improve Learning in Our Schools. Hearing t)efore the 
Subcommittee on Elementary, Secondary, and Vocational 
Education of the Committee on Education and Labor, 
House of Representatives. One Hundred First Congress, 
Second Session. 

Congress of the U.S., Washington, D.C. House 
Committee on Education and Labor. 
7 Jun 90 

130p.; Serial No. 101-111. 

Superintendent of Documents, Congressional Sales 
Office, U.S. Government Printing Office, Washington, 
DC 20402. 

Legal/Legislative/Regulatory Materials (090) — 
Reports - Evaluative/Feasibility (142) 



EDRS PRICE MF01/PC06 Plus Postage. 

DESCRIPTORS * Academic Achievement; * Educational Assessment; 

Educational Improvement? Elementary Secondary 
Education? *Evaluation Methods? Federal Government? 
National Surveys? *Standardi2ed Tests? St.^dent 
Evaluation? Test Construction? Testing Problems? Test 
Results? *Test Use; Vocational Education 



ABSTRACT 

This document provides statements presented at t*ie 
oversight hearing on testing and assessment evaluation to improve 
learning in the nation's schools. Walter Haney summarizes the 
National Commission on Testing anc5 Public Policy's recent report, 
which concluded that while well-designed and responsibly used 
assessment is an important source of information, there is too much 
national reliance on imperfect and unfair measures. Burton W. Faldet, 
on behalf of the Association of American Publishers, discusses test 
use and the importance of improving test quality. Walter E. Faithorn, 
Jr., representing Friends of Education, highlights the group's 
misgivings about test use and the misuse of standardized testing. 
Raunsay Selden, of the State Education Assessment Center of the 
Council of Chief State School Officers, indicates that better tearing 
practices are within reach. Much of the discussion subsequent to 
these statements centers on the connections eunong uest publishers, 
textbook publishers, and the construction of tests. Additional 
prepared statements, letterc and supplemental materials are included 
from the following persons and agencies: (1) the American Federation 
of Teachers; (2) the American Psychological Association? (3) the 
Council for Basic Education; (4) Christopher T. Cross, Office of 
Educational Research and Improvement; (5) Frederick H. Dietrich, the 
College Board; (6) Emerson J. Elliot, Natiom.l Center for Education 
Statistics: (7j Matthew G. Martinez, Representative from California; 
(8) Monty Neill, Natiorwil Center for Fair and Open Testing; and (U) 
Daniele G. Rodamar, American University. (SLD) 



OVERSIGHT HEARING ON TEST^G/ASSESSMENT/ 
EVALUATION TO IMPROVE LEARNING IN OUR 
SCHOOLS 



lO ' HEARING 

. BEFORE THE 

<M , SUBCOMMITTEE ON ELEMENTARY, SECONDARY, AND 
CO • VOCATIONAL EDUCATION 

^011^ I OF THE 

1 COMMITTEE ON EDUCATION AND LABOR 
HOUSE OF REPRESENTATIVES 

ONE HUNDRED FIRST CONGRESS 
SECOND SESSION 

HEARING HELD IN WASHINGTON, DC, JUNE 7, 1990 



Serial No. 101-111 



Printed for the use of the Committee on Education and Labor 




U t OCMRTMENT OF EDUCATION 
Offtca o( educalK>n«i Rtmrch and improvtmant 

EDUCATIONAL RESOURCES INFORMATION 
CENTER (ERIC) 

K'Thit documant has b««n raproduc*d at 
rftC«iw*d from tha p«rson or oro«ni2ation 

originatir>g \\ 

n Minor changat have b«an ma<5« to improve 
raproductioD quality 

e Pomtt of wtew or opinions stated m this docu- 
ment do not necesMnty represent oHiciet 
OERl positton or pohcy 



BEST COPY AVAILABLE 



U S GOVERNMENT PRINTING OFFICE 
*^ WASHINGTON : 1990 



For salf b> the Supermtendent of Documenta. CongmMional Sales OfTjce 
U S Oovernment Printing OfTjce, WashuiKton. DC 20402 



COMMITTEE ON EDUCATION AND LABOR 



AUGUSTUS F I 
WILLIAM D FORD. Michigan 
JOSEPH M. GAYDOS, Pennsylvania 
WILLIAM (BOX) CLAY, MisBoun 
GEORGE MILLER, California 
AUSTIN J. MURPHY, Pennsylvania 
DALE E. KILDEE, Michigan 
PAT WILUABiS, Montana 
MATTHEW G. MARTINEZ. California 
AIAJOR R OWENS, New York 
CHARLES A. HAYES, Illinois 
CARL C. PERKINS, Kentucky 
THOMAS C. SAWYER, Ohio 
DONALD M. PAYNE, New Jersey 
NTTA M. LOWEY, New York 
GLENN P06HARD, Illinois 
JOLENE UNSOELD, Washington 
CRAIG A. WASHINGTON. Texas 
JOSfi E SERRANO, New York 
JAIME B. FUSTER, Puerto Rico 
JIM JONTZ, Indiana 
KWEISI MFUME. Maryland 



[S. California. Chairman 

WILLIAM F. GOODUNG, Pennsylvania 

E THOMAS COLEMAN. Miseoun 

THOMAS E. PETRI, Wi«»nsin 

MARGE ROUKEMA, New Jersey 

STEVE GUNDERSON. Wisconsin 

STEVE BARTLETIT, Texas 

THOMAS J. TAUKE. Iowa 

HARRIS W. FAWELL, Illinois 

PAUL B. HENRY, Michigan 

FRED GRANDY. Iowa 

CASS BALLENGER, North Carolina 

PETER SMITH, Vermont 

TOMMY F. ROBINSON. Arkansas 



SuBCOMMirrKE ON Elementary, Secondary, and Vocational Education 

AUGUSTUS F HAWKINS. California. Chairman 

WILUAM F GOODUNG. Penn«.ylvania 
HARRIS W FAWELL. Illinoia 
FRED GRANDY. Iowa 
PETER SMITH. Vermont 
STEVE BARTLETT. Texas 
STEVE GUNDERSON. Wisconsin 
THOMAS E PETRI. Wisconsin 
MARGE ROUKEMA. New Jersey 
E THOMAS COLEMAN. Missouri 



WILLIAM D FORD, Michigan 
GEORGE MILLER, California 
DALE E KILDEE. Michigan 
PAT WILLIAMS, Montana 
MATTHEW G MARTINEZ. California 
CARL C PERKINS. Kentucky 
CHARLES A. HAYES. Illinois 
THOMAS C SAWYER. Ohio 
MAJOR R OWENS, New York 
DONALD M PAYNE. New Jer«»ey 
NITA M IX>WEY. New York 
GLENN POSHARD. Illinois 
JOLENE UNSOELD. Washington 
JOSfi E SERRANO. New York 



ERLC 



CONTENTS 



Hearing held in Washington, DC. June 7. 1990 1 
Statement of: 

Haney, Dr Walter, Boston College. Dr Burton W. Faldet, .'est Consult- 
ants. Ltd ; Walter E Faithorn, Jr , Business Executive and Volunteer 
Teacher at the University of the District of Columbia, and Ramsey 
Selden, State Education Assessment Center Council of Chief State 

School Officers . . 5 

Prepared statements, letters, supplemental materials, et cetera 

American Federation of Teachers, AFL-CIO, prepared statement of. 107 

American Psychological Association, prepared statement of 101 

Anrig. Gregory R, President. Educational Testing Service, letter dated 
July 19. 1990, to Hon Augustus F Hawkins, enclosing material for the 
record ' . 86 

Council for Basic Education, prepared statement of 104 

Cross. Christopher T , Assistant Secretary for Educational Research and 
Improvement, U S I>epartment of Education, letter dated July 9. 1990, 
to Hon Augustus F Hawkins, enclosing material for the record 79 

Dietrich, Frederick H , Vice President, Guidance, Access, and Assessment 
Services. The College Board, prepared statement of . 123 

Elliot, Emerson J , Acting Commissioner, National Center for Education 
Statistics, Office of Educational Research and Improvement, U S De- 
partment of Education, letter dated July 12, 1990, to Hon Augustus F 
Hawkins, enclosing responses for the record SI 

Faithorn, Walter E, Jr, Business Executive and Volunteer Teacher at 
the University of the District of Columbia, prepared statement of i^4 

Faldet. Dr Burton W , Test Consultants, Ltd , prepared statement with 
attachments ... 11 

Friends for Education, Inc , prepared statement of 71^ 

Martinez, Hon Matthew G , a Representative in Congress from the State 
of Cahfornia, prepared statement of - 

Neill, Monty. Ed D., Associate Director, National Center for Fair and 
Open Testing, letter dated June 21, 1990, to Hon Augustus F Hawkins, 
enclosing material for the record 9.'i 

Rodamar. Daniele Ghiolfi, Assistant Professor. American University, pre- 
pared statement of 110 

Selden, Ramsay, State Education Assessment Center, Council of Chief 
State School Clficers, prepared statement of 49 

(MI) 



ERLC 



OVERSIGHT HEARING ON TESTING/ASSESS- 
MENT/EVALUATION TO IMPROVE LEARNING 
IN OUR SCHOOLS 



THURSDAY, JI NE 7, 1990 

House of Representatives, 
Subcommittee on Elementary, Secondary, 

AND Vocational Education, 
Committee on Education and Labor, 

Washington, DC. 

The subcommittee met, pursuant to notice, at 9:50 a.m., in Room 
2175, Rayburn House Office Building, Hon. Augustus F. Hawkins 
[Chairman] presiding. 

Members present: Representatives Hawkins, Martinez, Hayes, 
Sawyer, Payne, Poshard, GoodHng, Smith, Gunderson, and Petri. 

Staff present: John Jennings, counsel; Dr. June L. Harris, legisla- 
tive specialist; and Jo-Marie St. Martin, education counsel. 

Chairman Hawkins. The Subcommittee on Elementary, Second- 
ary, and Vocational Education is called to order. The hearing this 
morning is an oversight hearing on testing assessment evaluation 
to improve learning in our schools. 

In order to conserve time, the Chair will not make an opening 
statement at this time other than to indicate that the importance 
of this hearing should be, viewed in terms of the importance of as- 
sessment itself. There is no way that we can achieve any of the 
goals in education without some form of measurement to assess 
whe»*e we are today or where we will be by the year 2000, or any 
oth'ir time. 

We have invited a numbei of witnesses who are highly qualified 
in their fields. We are deeply appreciative of their participation in 
the hearing this morning. If other members wish to make any 
statement at this time, the Chair will yield to any member who de- 
sires to make a statement. 

Mr. Martinez. Mr. Chairman, I have a written statement that 1 
would like submitted for the record, but I won't make statement. 

Chairman Hawkins. Without objection, the statement will be en- 
tered into the record at this point. Other statements that may be 
made by the members will also be included in the record. 

[The prepared statement of Hon. Matthew G. Martinez follows:] 

(1) 



ERIC 



2 



Congreiw of tte Vttiteb *tateK 

lloiiic of AiyrattittattiM 
•Utt^nfttii, BC 20315 



WAIMMCTOH OmCt 



490 N MMTMtuOliv* 




U t **OVM Of NWMMMTATnrtt 

or 2) 22»-t4«4 



COMMirrU OH EDUCATION *»« lAlOB 



OOVEDNMBfT Of fNATIOMt 



CMMMUM. 



■ OtU MMII « 0liCllMWICt.C0m u MW 
MB MOMTMir «WMM 



flutcoMwnir 



MATTHEW a. MARTtNBZ 



tmcoMMTTM on Nom 
Mtbrnrntrmmn 



torn XMmn Caiwomm 




tieOMOMrr MO VOCATIQHAL MUCATIOM 
•UVCOHMTTH ON MUCT HWCATnN 



ULECi coMMrrm on chuomn vouth 

ANOPAMIUIS 



STATEMENT FOR THE HEARING ON 
STANDARDIZED TESTING IN EDUCATION 

BY 

HONORABLE <4A7THEW G. MARTINEZ 

THE SUBCOMMITTEE ON ELEMENTARY, 
SECONDARY AND VOCATIONAL EDUCATION 



JUNE 7, 1990 



ERIC 



3 



MR. CHAIRMAN, I WOULD LIKE TO SUBMIT A 
WRITTEN STATEMENT FOR THE RECORD. 

MR. CHAIRMAN, WE ARE HERE TODAY 
BECAUSE SERIOUS CONCERNS HAVE BEEN 
RAISED ABOUT THE ADEQUACY OF TODAY'S 
STANDArNUlZED TESTS AND ABOUT HOW THEY 
ARE BEING USED. // FOR EXAMPLE, IN ONE 
SCHOOL DISTRICT IN NEW ^RK, 61% OF 
THE CHILDREN HOPING TO ENTER 
KINDERGARTEN FLUNKED A STANDARD TEST 
FOR READINESS./ AFTER THEY WERE 
ASSIGNED TO A SPECIAL TWO YEAR 
KINDERGARTEN, A STUDY SHOWED THAT THE 
TEST HAD A 50% MARGIN OF ERROR. / / 
THAT IS, IT WAS NO BETTER TH/.N 
FLIPPING A COIN. / IN GEORGIA THEPE 
WERE SIMILAR RESULTS WHEN A PEN AND 
PAPER TEST WAS MANDATED FOR PROMOTION 
FROM KINDERGARTEN. / FLUNKING 
KINDERGARTEN IS NOT A JOKE~AND 
EDUCATIONAL POLICY SHOULD BE BASFD ON 
SOMETHING MORE SOLID THAN THE FLIP OF 
A COIN . 

THESE TESTS ARE BEING MISUSED. IT IS 
LIKE THE BODY-COUNTS IN THE VIET NA M 
WAR- -WE GET HARD NUMBERS ON A 
WALL-CHART THAT MAKE GREAT HEADLINES 
BUT THEY ARE MISUSED. LIKE THE BODY 
COUNTS, THEY CAN TELL US WE ARE 
WINNING A BATTLE, WHEN WE MAY BE 
LOSING A WAR TO IMPROVE EDUCATION. 

DEPENDING ON THE CRITIC, THESE 
STANDARDIZED TESTS: (A) MEASURE THE 
WRONG SKILLS, \B) DISTORT CLASSROOM 
PRACTICE, (C) FALSELY ASSURE PARENTS, 
OR (D) DI3CIMINATE AGAINST THE 
UNDERPRIVILEDGED. / THE CORRECT ANSWER 
IS PROBABABLY "ALL OF THE ABOVE". 



WE NEED TO USE TESTS WISELY TO 
IMPROVE EDUCATION. EVEN MORE 
SERIOUSLY WE NEED TO ELIMINATE THE 
MISUSE OF TESTS IF WE ARE NOT GOING 
TO SHORT-CIRCUIT EDUCATION REFORM TO 
PIGEONHOLD KIDS AND SEAL OFF 
OPPORTUNITIES FOR ALL AMERICANS. 

: LOOK FORWARD TO HEARING THE 
TESTIMONY, AND TO FUTURE CONSIDERATION 
OF THIS IMPORTANT ISSUE. 



THANK YOU. 



4 



ASSOC lAllOU I OU II ONIA I MUrAIION 



NDMIMI Mil 



Tin- C0LUi:ci^ iioaiu) 

CON riNUI- S I- UIK)C!' N'l lUSM 

IK NN -ll K.iiin 

iIk ( (illii!! Ilonil .ihsllMKMlvKi iImI |Ii nv xK s IIii 
Si liitl -Nlu /\|iIiIimL I I ^ls(S \ I ^) mil ii h'l * i ii>i nl U sl% 

liii lii^'li M liiiiil Nliiili iil^ iMm'^ iiir I t'< il* ■ \» I** 

iii>ii|iiiilil in)' nil/ iliiMi Hii Mi\i*i Ml III % iliiiinn 
iiiilililt Ml s iIk I iihIi III III iIm Is iiiii (lit miU 

]ll|% lllllllll I'lijilll Dill |lll ill jiilslx Ill\ III 

Nhhil H «l I I Ills til I l< ml hill* l-isi III liii'iK 

■ I'llllll il lit ■Mi«' ilist Hss >l I i>1li iiliiiiss s 

>< I llii 'i mImI Mil ( .ill. II »i 

I mil IIIIIIIS .lilinillll III |II.«S I I Hill !'< II. ill.l.i. II til M<> IHI. 

Mils iKiiiil II IS I Imiii^JiI I.I I III III li\ llii II (I III 

<!< UMiiti III llii (iiIUk« M hii ( iiiihinii lis |iiili( \ nl 

I \ \ liultuj; \si ui I iiMiii H'l * h.iiii ilMilli iim'v hI u iiH \ I 
nil III II sis 

I'llSlS Illl<\ llistllllt SSMlll IS IIh PiiIVIISIIV III 

( .illliiiiiii iii|iiiif sliiili tils til siiliiiiil (III S\l (11. 1 
iiliii\i I It sis \s I II tin II ipplii ihiiiis Sliiilt iMs III n 

1 1 II MIS I .iiiniiii; I iMi»ilii I III i( III! \ I i.ii lit 1 1 sis iii\ l.iitint^ 

lli\ liilliiw III); LitittM iCi hsis Liliii ( t ti< S{i iiiisti 

I Il In! ml III III! u 

Willie I he Colicf^L* Uii.ii (l\ .kIikx uiiiLiill csis 

LICilIC .1 ll.llKllit (I Mitl llll I I|>L.II) 

tjiit;ii«i^e^ Ml lliu liit;li sil Is, Asi.iii 

I.M^'ti.i^cs iit;l>l pliitl Ii.iUIl' 

CAI(II22 

Mills I LllHi^l hllllltti Ill^ll si Inlul sltK II lit kl«l)\\IM^ 
Mill .It llM \ 4 llll III II sis III Vsl in I MHOl .f. S U. Ilnl 

, iilli II ilhy llll Ciillr^i II il \nllilii Ii i I nn.^ 

^ Jll III I lllf^ll l^'.l IIISLlll.llllllt.il I I. Ill ,111 ASI III I. Ill 

^ < nlli p InmiiiiI shtili Ills intisl liillill i ninii i in is ii 



i| lilt iitf iil^ Id i}ii ility Itii 
ii>il\ ) isil) ! Ill II ts liUl. iiiiiln )< 
l.ikc .III Nsiui l.iii^tt !)',( aiiii I I II 

SI l|IILtlll\ IIkII is tlillL IIIII|I\ llll 

iilli I Ast III I iiii;ii ii;« s il tti. st 
llll. il 

Wliili (IilC iilli lliLiii! s ilIi 

II i'mI iI iIl.1I Mill (til I IIM.|II III 

M liiinis A SI III I iii|'ii It', s lij'Jil I 



)lll 



Sin Wly II stii.l 

I'lilI'Vr 

SI s Ulil 



(l.llstll 

ll.l II. 



. nil HI (» sfs (. 



Ii»nll,cr .^-.inls llic Colli-»;f lloarti li.is m,!i||v t tciUmI 

'I ««" Iv slupi,.^ I, ml, xd.tm! 

tiiiiKiilinn, il K in .,in,^ j-jinncnliK- one 

I lu OilU Ho-iiil ..rj;iri.s Hi.il it is vcv cxpt-nsm. k, 
•l« lii|) Asi.in .iilm'\fiiu III li-sis .mrl i wii il llif h s\% 

'''^'I'M"'! 15 iliusm tint ^ MiiiUnls 

II ihriiiwuir xxill I \V^ sni li h s|s (, ikli 22 

ASI \N l'ni'UI,\ I ION CIIOW'I II 

llll mimili III III. As |ii>|iiil III „ Hms t In 

|'.«<.Mili,U iiiC.il.l i.r.iiscs.iuiilli< I M.nn 

Hi.' Ci.lU'Kr liiuiil m |,rn. kIi-Iii,^ \m .n 

^li"li Ills lititi, t.ikuin .III As,.m l.m^Miur ,u I,,, ^ 

UM IN Mil' U shun SI m,, n ifiipii^ i,i r.K i.il lins' 

' il'"' -il!" i;i <l Asi in uliiiissiuiis t i|> Il 

imslimniis |»iislsmiMil.,n ,||^|,| „^ |,_,^ „„, j^^^j 

' 15" Il ^ .1 ImsIi.iii 1,1 .K „l t 

»'•' l'<'N .iiliiiilli (I ih.il iK .hImiixxu,,, |. 

•iMtnliiihiHulU Iu.iImI Asi.m .i|I|)Ik.iiiK iinl.iitl. 

I' '''"'^ ■'^'*^^^'lIi.illiK..liMi,li„uluas.i„<| 

l'»liiiti SM ,iss»KmIK J)tui:u'ssiu M \W\Vi\K'\ t.,,1 

I ' M"tl .III Mimiiiy ,1, ,M sliti, .,x l|„. 

^Z^' "MMI Mi.l.llv u.lll 

t III* M S|lll{|SlMl 

, " I' liHU Os ,„ As, 1,1 

'"'1 n I ill IIM s ,.,1,1111 

I ""^M" Nn s sul M ,M„I, 

^' * M I S k 

t-> nil \i. i-K 

l>< iiU IK .( IK IIh. 

•'Ill 11*1 I K* SIM I.I H In, sis \. I llii 

I" i|t|Mi.|.iiHi .1 |,„ |l„s ^, 

l\l W I II IS lltl sllllU (,l Asi.ii 

.IS II u hnl u ««,||, ||„ 



is.i.utiiniis.i|,,w r 

pli. IS. Slllll.ll ,SlK 

t.iiii;ii i^( s I s 

Hint III I iiMiiitt; 



I ill. K.^li 



" l'»' 'I I tn Ha\\\, ikiIiIk.iIK 

iiiti I i.lliii ill\ 

, ' vlivillii sI„,Km| \„,„ 

'» s 

iiMl niMO isihi s i.iiisiilv , tmu.i! I>,sl st, |» |,m ir,U .t 
I'l H. . inn), IS „„„„^ „t „tl„ , tnllnrts 

Imi I II iliiitil li.ni ll.il ,\si III l„imii,Ms I.I t „ 

> ' 'I l*i>l "»l |,„ „ I ( iImh^U 



ERIC 



5 



Chairman Hawkins. The hearirg will consist of a panel of ex- 
perts in their particular fields. May I ask these witnesses to sit at 
the witness table. 

Dr. Walter Haney is Senior Research Associate Director for the 
Study of Testing Evaluation and Educational Policy, Boston Col- 
lege. He's representing the National Commission on Testing and 
Public Policy. , , . 

Dr. Burton Faldet, President, Test Consultants, Ltd., Illinois, rep- 
resenting the Association of American Publishers, Inc. 

Dr. Faithom, Jr., retired business executive, volunteer teacher at 
thj University of the District of Columbia, representing the 
Friends of Education, New Mexico. 

Mr. Ramsay Selden, Director of the State Education N^ssessment 
Center, Council of Chief State School Officers. 

Gentlemen, we will recognize you in the order in which your 
names have been called. May we request that your prepared state- 
ments in their entirety be entered in the record, and we hope that 
you will summarize or highlight your testimony so as to leave time 
for questioning at the end of your statements and give us an oppor- 
tunity, in a very informal sense, to try to develop the subject 
matter which will be most productive for the committee. 

We are in the process of drafting a title to an omnibus education 
bill, and we believe that vithout this title the omribus approach to 
education in a more comprehensive approach would be obviously a 
failure if we do not, in terms of a title on assessment, develop at 
least the beginning of the subject. 

We obviously are not g^ing to conclude this hearing today as the 
only hear\ng on this particular subjec" matter. We will continue 
our comiTiunication with you and hope that we can call on you 
from tin^j to time to help us in refining the title so that it is mean- 
ingiful -n terms of approaching the problem. Dr. Haney, you may 
proceed. 

STATr.MENTS OF DR. WALTER HANEY, BOSTON COLLEGE; DR. 
BURTON W. FALDET, TEST CONSULTANTS, LTD.; WALTER E. 
FAITHORN, JR., BUSINESS EXECUTIVE AND VOLUNTEER 
TEACHER AT THE UNIVERSITY OF THE DISTRICT OF COLUM- 
BIA; AND RAMSAY SELDEN, STATE EDUCATION ASSESSMENT 
CENTER COUNCIL OF CHIEF STATE SCHOOL OFFICERS 
Dr. Hanky. Yes, sir. Thank you. My name is Dr. Walter Haney, 
and I'm a senior research associate at Boston College. I am here 
this morning representing the National Commission on Testing 
and Public Policy. i j u 

Tm substituting for Dr. Bernard Gifford who had hoped to be 
here this morning but unavoidably could not come this morning. So 
I wanted to first pass along Dr. Gifford's apologies for his not being 
here today. 

What I would like to do briefly is to summarize the recent report 
of the National Commission on Testing and Public Policy. I have 
provided copies of the Executive Summary of the National Commis- 
sion's report to the members of the committee. If you would desire 
full copies of the Commission Report, I would most certainly be 



6 



pleased to provide them. I simply could not carry copies in my lug- 
gage this morning. 

Chairman Hawkins. Doctor, we would probably need about 35 
copies. Every member of the committee should be supplied with a 
copy. 

Dr. Haney. Thirty-five copies? 

Chairman Hawkins. Yes. If you have those available, we would 
appreciate it. 

Dr. Haney. I will get those sent to you as soon as I return to 
Boston. 

The National Commission has worked for three years investigat- 
ing the role of testing in the United States in both the realms of 
education and the realms of employment. The Commission's work 
was motivated by a fundamental concern that America must 
revamp the way it develops and utilizes human talent. 

To do that in the future, as human talent is increasingly becom- 
ing the life-blood of our nation's future, we must restructure test- 
ing so that talent is promoted rather than merely screened or clas- 
sified. This will require that we rethink incentives legarding edu- 
cational testing and assessment. 

The C-ommission was concerned that currently there is over-reli- 
ance on testing that is predominantly multiple choice in format 
and that sometimes leads to unfairness in allocation of opportuni- 
ties and too often undermines vital social policies. Nevertheless, at 
the outset, I want to make clear that the Commission— all the 
members of the Commission — strongly felt that there is a vital 
place for testing in both our education and emplovment systems. 

Specifically, the Commission concluded that well-designed and re- 
sponsibly used assessment can La an important source of informa- 
tion about how our organizations and institutions are doing, what 
our children are learning and how well, and who among us is 
likely to make the most of opportunities that cannot be provided to 
all. 

Since you have a summary of the Commission's recently released 
report, let me only very briefly summarize the main findings and 
recommendations of the Commission. 

First, the Commission concluded that tests are imperfect meas- 
ures with regard to both individual's learning and their employ- 
ment potential. 

Second, testing can result in unfairness. Some uses of testing do, 
in fact, result in unfairness not only for individuals but for identifi- 
able groups of our society. 

Third, in the education realm the Commission concluded that 
there is simply too much testing. There has been a vast increase in 
testing in the Nation's schooic. over the last 20 to 30 years, and the 
Commission concluded that students in our nation s schools are 
simply subjected to too much testing. It was estimated that stu- 
dents spent the equivalent of 20 million school days each year 
simply taking standardized tests. 

If I may divert from the text findings of the Commission, let me 
simply illustrate some of the evidence that we accumulated to sup- 
port that finding. 

Chairman Hawkins. We'll get some staff to assist you, volun- 
teers to help out. Do you need some assistance? 




7 



Dr. Haney. I only have a couple of charts. 

Chairman Hawkins. Okay. Well, I want to keep my staff busy. 
You've taken that job away. 

Dr. Haney. We had a stand for the charts, but inadvertently 
someone removed it just before we started. So we will substitute. 
But as an experienced teacher, I am quite familiar with having to 
improvise as I speak. 

This chart simply represents the growtn in state testing pro- 
grams form 1950 to 1990, as summarized in an Office of Technology 
Assessment Report from the U.S. Congress in 1987. 

It only goes to 1987. There has been an increase in state testing 

Programs since 1987 so that now virtually every state in the 
fation has a state testing program. In addition, districts have their 
own testing programs. Additional testing may be mandated as a 
result of other special programs. 

Because of this repetitive testing and unclear evidence that it 
was providing instructional useful information, the Commission 
concluded that there is simply too much testing in the Nation's 
schools. Also, it seems clear that in addition to there being simply 
too great a volume of testing, that some forms of testing may in 
fact be undermining educational efforts in the schools. We found 
evidence that in many places instructional practices had been 
transformed simply into test preparation practices, for example. 

More broadly, in the fourth finding of the Comniission, the Com- 
mission concluded that testing is undermining important social 
policies, not just in education, but in the employment realm as 
well. There are several examples that the Commission cited of this 
general finding to illustrate this problem. , 
The fifth major finding of the Commission was that there s 
simply insufficient public accountability regarding standardized 
testing programs. That while tests have become instruments of 
public policy for maintaining accountability, there is insufficieni 
public accountability with regard to the tests themselves. 

Rarely are important tests subject to formal systematic profes- 
sional scrutiny or examination in public. As a result of these gener- 
al findings, the Commission concluded in its fundamental recom- 
mendation that current testing policies and practices need to be 
substantially restructured to help promote the develop and talents 
of people to become constructive citizens and to help institutions 
become more productive, accountable and just. 

To help promote a vision of how this might be acconiplished, the 
Commission made eight general recommendations which are sum- 
marized in materials I hive provided so let me only briefly men- 
tion them here. 

First, testing policies and practices must be reoriented to pro- 
mote the development of all human talent, not just to select aniong 
people or to classify people to promote the development of all 
human talent. 

Second, testing programs should be redirected from reliance on 
multiply choice tests toward alternative forms of assessment. But I 
wish to make clear that the Commission did not think there is any 
one quick fix regarding a better test or a better assessment. These 
sources of information about students' learning must be used flexi- 
bly and in different ways for different purposes with avoidance of 



8 



ovtr-roliance on any one form of assessment, be it a multiple choice 
test or some alternative. 

Third, test scores should be used only when they differentiate on 
the basis of characteristics relevant to opportunities bring allocat- 
ed- Too ofcen a test is used simply because it is available when in 
fact it is not relevant to the opportunities being allocated. 

Fourth, the more test scores disproportionately deny opportuni- 
ties to minorities, the ^ater is the need to show that test measure 
characteristics relevant to the opportunities being allocated, be- 
cause they found clear evidence that some uses of tests were in fact 
promoting unfairness with regard to allocation of opportunities to 
minorities. 

The final three findings of the Commission I will summarize as 
follows. Test scores are imperfect measures and should not be used 
alone to make important decisions about individuals, groups or in- 
stitutions. In the allocation of opportunities, individuals past per- 
formance and relevant experience must be considered. We can no 
longer tolerate bureaucratic decision-making about individuals on 
the basis of single test scores because of the fallibility of all test 
results. 

Sixth, more efficient and effective assessment strategies are 
needed to hold institutions accountable. Right now we have consid- 
erable evidence that testing programs are providing us with mis- 
leading information about the performance of some of our vital 
social institutions. 

Seventh, the enterprise of testing must be subjected to greater 
public accountability, and we must view testing for the purposes of 
accountability separately from testing for the purposes of promot- 
ing individual student learning. 

Eighth, research and development programs must be expended 
to create assessments that promote the development of the talents 
of all of our people. While must research and development has 
gone on in the past concerning testing and assessment, the Com- 
mission felt strongly that future research regarding testing and as- 
sessment needs to be motivated by the primary goal of testing and 
assessment to promote the development of human talent rather 
than simply testing and assessment to classify or measure p)eople. 

That's a summary of the Commission's report. I will be glad to 
answer questions and provide you with the full copies of the Com- 
mission's report as you requested. Thank you very much. 
^ Chairman Hawkins. Thank you. Doctor. We'll get back to you 
I'm sure during the questioning period. 

At this point, I should like to announce that there's a vote pend- 
ing in the House. Some of the members may care to go and respond 
to the voting or to alternate. Those who do go, I request that you 
return and perhaps bring another member of the subcommittee 
back with you. 

The Chair is not desirous of going over to waste time on a useless 
vote such as this one. 

Mr. GooDUNG. I don't have an opponent this fall so I'm not wor- 
ried. 

[Laughter.] 

Chairman Hawkins. Well, at least we have a formal quorum and 
we'll continue. 



ERLC 



12 



9 



Dr. Burton Faldet— I hope Vm correct in pronouncing your 
name— President, Test Consultants, Ltd. of Illinois representing the 
Association of American Publishers. 

Di. Faldet. Well Mr. Chairman, members of the committee, my 
name is Burt Faldet. I appreciate this opportunity to appear before 
you today on behalf of the Association of American Publishers. 

AAP is the principal trade organization representirg more than 
235 member firms that publish hardcover, paperback books, profes- 
sional, technical and scientific iournals, computer software and 
classroom and education materials, including indeed tests and eval- 
uation and scoring services. 

I am President of Test Consultants, Ltd., which provides services 
in evaluation, design and implementation strategies to education 
and business. From 1965 to 1987 I was with Science Research Asso- 
ciates, a commercial test publisher, where I was involved in a vari- 
ety of positions, mf :nagement and staff, in the development, publi- 
cation and use of *itandardiied tests for schools and industry. 

Tve also iaughv bome courses in measurement. I was a school 
psychologist, a science teacher, and director of P^apil Personnel 
Services. 

There are several roints that I would like to discuss today about 
the development and use of standardized tests in elementary and 
secondary schools from the perspective of the publisher of such 
tests. My statement does not address higher education, employ- 
ment or military testing. 

The first, ana what I hope will be the most important message 
ril leave with you today is that the developers and publishers of 
standardized tests should be setii as part of the solution for im- 
proving the quality ot educational instruction, not as part of the 
problem. 

The second message is that test diversity and competitic i should 
be encouraged to assure improved education and improved assess- 
ment instruments. Different needs for information are served by 
different kinds of tests. No one test can accomplish all of the di- 
verse objectives of our educational system. 

It is a serious mistake, of course, to try to make tests do what 
they are not designed to accomplish or to use tests as the sole 
means for assessment in most situations. Finally, I want to assure 
the committee that the test publishers working with the education- 
al community are and will continue to expand and improve their 
testing produces to meet continually emerging educational de- 
mands. 

In the interest of time, I will leave your reading the material 
submitted for the record. In them, Tve summarized some of our 
thoughts on why testing occurs from our perspective, the limits of 
tests, the different kinds of tests and their uses, and the role of the 
test developer and publisher. 

Publishers are not simply printers, bookbinders and marketers. 
They are an integral part of the educational system, providing an 
essential delivery system, as well as taking the initiative for and 
bearing the risk of developing new and innovative materials. 

What recommendat'ons do we have for Congress? The first is 
that you continue to hold hearings such as this on education issues, 
particularly testing, as a prelude to any possibl'^ further action. 



Er|c xo 



10 



Second, Congress should continue to assure diversity cf testing. No 
single test, no single curriculum, no single textbook can or should 
meet our nation's diverse educational needs. 

Competition among test developers, including a vigorous private 
sector, should be encouraged. Publishers have a very vital role in 
making whatever test program may be adopted by a school work. 
They provide an economical and efficient delivery system for as- 
sessments of many kinds. Publishers have traditionally served as 
an important bridge between sound theory and sound practice. 

Indeed, they have been the vehicle for getting local school accept- 
ance of new concepts and the resulting; pixxlucts. They have been 
the primary link between those who create and those who must im- 
plement. We do not see a change in this role nor do we believe that 
a change is desirable. For this reason it is important to involve 
publishers in the early conceptualization of products resulting from 
sound research. 

One of the crucial concerns is the proper interpretation of test 
results. One suggestion we would have for you might be to provide 
funding for targeted in-service training to teachers and administra- 
tors in interpreting test results to enable them to use tests better 
to improve instruction and to convey information to students, par- 
ents and the public. 

State and local education agencies might be encouraged, if not 
required, to develop a comprehensive assessment plan which would 
identify instructional and accountability goals and objectives and 
those assessment instruments that would be used to achieve them 
and measure progress. The plan could include specific programs for 
in-service training, public infornation and for assuring that tests 
are selected, used and interpreted appropriately. 

We do not believe that the Federal Government should get inti- 
mately involved in state and local testing business. Continued fi- 
nancial and technical support for research and development on in- 
novative assessments, as now provided by the Department of Edu- 
cation and the National Science Foundation, would enable contin- 
ued progress toward improving educational assessments. 

Thank you for your attention. I would be pleased to respond to 
any questions the committee may have. 

[The prepared statement of Dr. Burton W Faldet follows:] 



ERIC 



11 



AMOdtllon of Afn*rtc«n Pubtith«r«, tnc 



oap 



l7ia Connecticut Av»nu« NW i7CX) 
Watnmgion OC 20009 
T«)«pnon« 202 232 333S 
f AX 202 74%-06*4 



STATEMENT BY BURTON W. FALDET 
ON BEHALF OF THE ASSOCIATION OF AMERICAN PUBLISHE RS 
BEFORE THE SUBCOMMITTEE ON ELEMENTARY. SECOND ARY. 
AND VOCATIONAL EDUCATION 
COMMITTEE ON EDUCATION AND LABOR 
JUNE 7, 1990 



Mr. Chairman and members of the Committee, by n e is Burt 
Faldet. I appreciate this opportunity to appear before you 
today on behalf of the Association of American Publishers. Th 
Association of American Publishers ("AAP") is the principal 
trade organisation representing more than 235 member firms that 
publish hardcover and paperback books; professional, technical, 
and scientific journals; computer software; and classroom and 
educational materials, including tests and evaluation and 
scoring materials. 

I am President of Test Consultants. Ltd. which provides 
evaluation, design, and implementation stiatp^jjes to education 
and business. Our clients have included t ^-nrr roial test 
publishers, the American Institutes foi Rpsf>aM;.i. IBM. as well 
as individual school districts. From \'>bh tu 1987. I was with 
Science Research Associates, a commetcial test publisher, where 
I was involved m a variety of positions m the development, 
publication, and use of standardized tests for schools and 
industry. I also have taught undergraduate courses m 
Measurement and Evaluation and secondary school science, and 
served as a School Psychologist and Director of Pupil Personnel 
Services . 

There are several points that- i would like to discuss today 
about the development and use of standardized tests m 
elementary and secondary schools , from the perspective of the 
publishers of such tests. My statement does noc address higher 
education, employment, or military testing. 

The firr':. and what I hope will be the most important 
message I leave with you today, is that developers and 
publishers of standardized tests should be seen as part of the 
solution for improving the quality of educational instruction, 
not as part of the problem 



ERIC 



•5 



12 



The second message is that ^est diversity and competition 
should be encouraged to cssure improved education and improved 
assessment instruments. Different ob]ectives are served by 
different kinds of tests no one test can accomplish all of 
the diverse objectives of our diverse educational system It is 
a serious mistake to try to make tests do what they are not 
designed to accomplish, or to use tests as the sole means for 
assessment m most situations. 

Finally. I want to assure the Committee that *'est 
publishers — working with the educational community are 
expanding and improvinq their testing products to meet 
continually emerginq educational demands. 



WHY TEST? 

Measurement can be re?atively exact — but a numbor has no 
meaning until someone makeb a judgment about it. lhat is the 
difference between measurempnt and evaluation. There are many 
ways to determine health; a number on a thermometer is one 
indicator, but it takes someone to exercise judgment as to the 
significance of the temperature shown, and to take the 
appropriate action as indicated by the reading on the 
thermometer. It would be imprudent, however, to rely entirely 
on temperature to make a diagnosis of the patient. 

Why educational testing? Testing is of value to the 
student. It serves to provide some information that can be used 
by educators and parents to identify and respond to the 
instructional needs of individual pupils and to improve 
instruction of individudl pupils. Testing is a means to assess 
progress toward specific educational objectives, as evidenced by 
what pupils can do in terms of skills exhibited. 

Testing also serves broader, institutional goals. It 
assists in assessment of long-^ange effects of changes in the 
educational program, enabling comparison of (1) performance over 
time and to changes in the instructional program or to changes 
in population characteristics and (2) performance across 
different subject areas, such as mathematics and reading, to 
determine strengths and wedknesses. needs for program 
modification, cr changes of emphasis. Testing is one means to 
evaluate performance for accountability purposes. 

The methods of evf^luatlng whether children are learning what 
IS being taught have changed over the years, just as many 
techniques and objectives cf teaching have changed. For 
example, standardized achievement tests and numerous other types 
of tests have supp len.ented teacher-nade tests administered on a 
class-by-class basis. 



-2- 



ERIC 




13 



LIMITS TO TESTING 

It must be emphdsired, howevei, that there are limits to 
testing. When testing is used in -high-stakes" situations and 
results are used as a s»mple "pass/fail" barrier to students, or 
to reward or punish teachers and administrators, when the 
pressure becomes so intense that there is "teaching the test- 
rather than teaching the skills and concepts that are being 
evaluated, when test scores become the sole cr'teria for 
evaluating student peiformance or potential or the effectiveness 
of instruction, then testing has gotten out of hand and is being 
misused and abused. 

Tests are a necessary but net sufficient me^^ns to assess 
achievement and growth in sKiUs and abilities. What nay be 
tested IS not, ana cannot te, inclusive of all of the desired 
outcomes of instruction 

Tests may b: used as a partia 1 basis for evaluation. Tests 
are concerned jnly with certain basic skills and abilities and 
are not inten'ied to measure total achievement in any given 
subject or g-ade; they are not inclusive of all the desired 
outcomes of educ^»tion. Standardised tests are concerned with 
only those area3 of instruction that are amenable to objective 
measurement . 

It should also be recognized that local performance is 
conditioned by many influences. The instructional effectiveness 
of the teachi'* staff is only one of tnese factors. Among other 
factors are t-.e pupils* school and home environment, their past 
educational history, and the quality and adequacy of the 
instructional materials with which the staff has to work. 

As stated in the Manual for School Administrators for one 
standardized test. 

At all times, the tests must be considered a means to 
an end and not ends in themselves. These tests have 
their principal value in drawing attention of the 
teaching staff and the pupil to those specific aspects 
of the pupil's development most in need of individual 
attention; in facilitating remedial and individualized 
i.»struction; in identifying those aspects of the whole 
program of instruction most in need of increased 
emphasis and attention; and in providing the basis for 
more adequate educational guidance of the individual 
pupil. If properly used, the results should motivate 
both teachers and pupils to increased, better-directed 
efforts in both teaching and learning. 



ERIC 



14 



When intelligently used in combination with ether 
important types of information^ the re&ults obtained 
from these tests should prove very valuable in the 
appraisal of the total program of instruction. Unless 
they are used in con]unction with other information, 
however^ they may do serious injustice to many teachers 
and to many well conceived instructional programs. 



KINDS OF TESTS 

Different tests have been developed to meet a variety of 
purposes. Some tests are subjective, bcth as to th< matter 
tested and the interpretation of the results. A standardized 
test IS an objective test that uses the same standards to 
measure student performance across the country; everyone takes 
the same test according to the same rules. 

A normed-reference tesc (NRT) is a standardized test used to 
compare students' performance in terms of a carefully selected, 
nationally representative group, or norm* on the same test; 
performance is based on total test or subtest scores. (In 
contrast, for some tests* such as the SATs and ACT, the norm is 
based on the others taking the test, rather than to a 
standardized national norm.) 

A criterion-referenced test (CRT) differs from a 
no rmed- reference test primarily m how test scores are 
interpreted and used. A criterion-referenced test is used to 
evalu»*'e and report performance in terms of specific 
insti .tional objectives or skills, stated m measurable terms. 

These labels are not mutually exclusive. Many 
criterion-referenced tests are normed, and many norm- referenced 
tests may be subject to criterion-referenced, content-based 
interpretat ions . 

Teacher-made tests generally are intended to provide 
information about mdividaal student's performance on specific, 
classroom-oriented* curricula or specific needs for information 
about students. These tests are frequently supplemented by 
v: extbook tests , which are developed by textbook publishers and 
may appear in textbooks or be provided to teachers as 
supplementary instructional materials. Both of these test? aie 
associated frequently with grades on report cards and help 
measure a student's progress in class, well as facilitate 
individualized instruction. 

Tests can also be in a variety of cormats. Multiple-c hoice 
tests offer the advantages of objectivity and uniformity or 



-4- 



ERIC 




15 



scoring, ease of administration -..J scoring, and low cost. 
There are disadv;^ntages to such tests, particularly if they are 
utilized as the exclusive methou of assessment. 
"Performance-basfed tests "authentic assessments," or 
ralternative ass essments" generally aie open-ended tests that 
are not multiple-choice. They inci»tde essays, writing samples 
and portfolios of work, practicums, or oral or visual 
demonstrations. They generally are more expensive, 
labor-intensive, and require more training and preparation to 
administer and evaluate factors which also can make them 
affirmative educational tools. The same concerns for validity 
and reliability, standardization if used for comparisons, and 
abuse if used in high-stakes situdtions that are raised with 
multiple-choice tests are applicable to performance test-^ 

Performance testing and standardized testing are not 
mutually exclusive. It is important to point out that for 
several years writing and listening assessments performance 
tests — have been offered by test developers as part of their 
standardized test batteries. Publishers are now offering 
portfolio tests to supplement their current test batteries. 

What are the particular advantages of a norm-referenced, 
standardized test? It ensures rel labi 1 ity and validity in data 
collection, analysis, and interpretation, it enables evaluation 
of student achievement in various grades and subjects for the 
pu»'pose of aggregating and reporting achievement gains in terms 
of a common reporting scale (e.g., normal curve equivalent or 
grade equivalent), with nationally representative norms. It 
provides an objective, rather t an a subjective, assessment. 

Norm-referenced, standardized tests also enable 
identification of problems in specific skill or subject area 
deficiencies for teacher attention and remediation. This may be 
partic'ilarly important in the early grades. 

Norm-referenced, standardized tests use the same or parallel 
test Items for all students, which makes scores for all studenfs 
comparable; use of one level per grade facilitates 
criterion-referenced interpretation of results for classes, 
buildings, and systems. Individual scores can be related to 
comparable national norms. One skill can be compared to another 
on a pupil, class, building, or system basis. 

A classroom ma^' have such a wide range of skills that no 
simple test can be equally suited to the entire range of 
achievement; NRTs for different levels of achievement can be 
administered so that each pupil takes the level that corresponds 
most closely to the individual instrurtional ob:jectives and 
levels of skill development 



ERIC 



16 



BOr.R OF T HE TEST DEVELOPF P AWD PUBLISHER 

Test developers adhere tc strict standards, as developed by 
IP Education in the rnriii off Fair Testing Pract ces in Eoucac 

or linguistic backgrounds who are likely include 
books give in-depth. candid reviews of available tests, mciuoe 

M>«.nri.rT^nt Yearbook, published by 3, . 

Institute of Mental Measurements, while guides and evaluations 
l?e published by the ERIC Clearinghouse on Tests. Measurement, 
and Evaluation and by other orr nirations. 

Initiat ive" ) . 

eicte^r -ce- rnctuSln' the .duc.t.on.l cor,.un.ty th, 
^esi pi^Misher deternin.s if there i» . m 
whether it will be financially viable. ^"'''"test to be 

test req"reiilnt5 of state and local education agencies. 

Extensive rese^-ch is required for 
rnd'Siolets-irpst^tt:™- ;s 

thfitudents to be tested. Items must also be free from ethnic, 
gender, or cultural bias. 

At least one tryout to obtain data for ^tandard item 
analysis and summary test statistics is needed This data is 



-6- 



ERIC 



17 



used to select items with desirable characteristics. Typically, 
an experimental edition will ccntain at least twice the number 
of Items required for the final test, to enable the publisher to 
reject undesirable items and still retain a sufficient number of 
Items for i final test of suitable length. Items for a 
norm-referenced test will be rejected if too m-ny examinees 
select the answer. In a criterion-referenced test students are 
classified in terms of mastery/non-trastery, so \tems will be 
selected that will have a large number of correctly-selected 



answers . 



Experimental test items are reviewed by educators and 
curriculum specialists and are then field tested with large 
numbers of student to check their responses. The comments of 
the reviewers and the data generated by the field test are used 
to select the items for the final edition of the test. 

The final, or standardized, version of the test is 
administered to carefully selected groups of students whose 
characteristics are similar to those of students throughout the 
nation The information obtained is then aggregated into norms 
so that individuals tested in the future may be compared to the 
original national sample. This is the process of 
standardization, and the nornative information obtained CfOf"/"® 
proce;.s is crucial to educators, parents, and students. Without 
It there would be no way of knowing how a single score on a 
specific test compared to the scores of other students in the 
nation. 

Publishers develop guidance materials to assure that 'he 
final test is administered in accordance with the 
standardization, and to provide instruction on how the test is 
to be interpreted. Information is also developed and provided 
on the technical characteristics of the test to support ^ts 
reliability and validity 

Scores can be reported and evaluated in a multitude of ways, 
for different uses. Rather than trying to describe scoring and 
interpretation in my testimony, I am submitting for the record 
an excerpt from Understanding Achieveme nt Tests: A Guide Cor 
schnnl Administrators , published by the ERIC Clearinghouse on 
Tests. Measurement, and Evaluation, on "What Types of Test 
Scores Are There. " 

Much controversy has been generated recen ly over 
norm-referenced testing. To address these concerns. I am 
at^achlng to this statement sev-ral articles froTi commercial 
test publishers that were included in the '^iummer 1988 
Educational Measurement: Issues and Practice that provide an 
extensive review of those issues. 



-7- 



ERIC 



18 



WHAT SKILLS ARE TESTED? 

Higher order skills, not ]ust basic skills, can be measured, 
even in a multiple-choice format, in a standardized test 
(remembering that it was only a very few years ago that 
publishers had to respond to demands for assessment instruments 
for the "back to basics" movement). We recognize that there are 
.lore direct ways of measur\ng higher order skills. 

As previously stated, the multiple-choice format used in 
assessment instrurjents has some attractive features. It is an 
efficient and effective way of measuring many educational 
ob}ectives. while we recognize that it has limitations as well, 
It ij important to recognize that most measure?, including 
criterion-referenced and performance tests, are samples of 
behavior from which inferences can be drawn. For example, a 
multiple-choice mathematics test, which includes five exercises 
in addition of two-digit numbers with carrying, is a "ample of 
all the possible two-digit numbers that we want a student to be 
able to add. For efficiency, we chose five exercises, and based 
on the student's performance on those, we infer what the student 
could do if presented with many more. Similarly, we may present 
a situation with several complex problem-solving exercises in a 
multiple-choice format. Based on performance, we can make some 
inferences about the student's performance in some of the higher 
order skills in the mathematics area. 

Similarly, we can infer some important aspects jf 
performance in writing from items commonly presented in 
multiple-choice language arts test-s. 

Neither th" problem-solving nor language arts tests are 
substitutes for direct observation of student performance over 
time and in c'.ifferent situations in solving problems and in 
producing written material. 

Reiterating a constant the..ie of this statement, that tests 
need not be mutually exclusive, I again want to point out that 
publishers of standardized tests currently also offer tests of 
listening skills and writing in addition to multiple choice 
tests, as well as portfolio tests. 

whether multiple-choice or performance tests, the keywords 
for the future, as they are today, are validity and 
reliability. Publishers cannot and should not market a test 
unless It has been demonstrated to be valid and reliable. This 
requires time and money, extensive research and developme-^*- , 
testing and reworking to assure that the test works. 



-8- 



ERIC 



19 



RECOMMENDATIONS AND SUGGE5TI0N> TOR FEDERAL POLICY 

On behalf of the publishers of standardized tests, I welcome 
tnis opportunity to meet with the Committee and discuss 
standordized tests and our role in the educational process As 
I said at the beginning of my statement, publishers want to be 
part of the solution, not part of the problem. Publishers are 
not simply printers* book^inders, and marketers. They are . 
integral part of the educational system, providing an essential 
delivery system as well as taking the initiative for and bearing 
the risk of developing new and innovative materials. Just as 
Congress would not think ot addressing the future of the 
automobile without consulting with automobile manufacturers, 
publishers should continue to be consulted and included in your 
continued deliberations over the quality of education. 

What recommendations do we have for Con*jress? The first is 
that you continue to hold hearings such as this on education 
issues* particularly testing, as a prelude to any possible 
future action. 

Second, Congress should continue to assure diversity of 
testing. No single test* no single curriculum, no single 
textbook* can or should meet our nation's diverse educational 
needs. Competition among test developers, including a vigorous 
private sector, -should be encouraged. 

rablishers h»ve a role in making whatever testing program 
that may be adopted by a school work. They provide an 
economical an*! efficient delivery system for assessments 
Publishers have traditionally served as an important bridge 
between sound theory and sound practice. Indeed* they have been 
the vehicle for getting local school acceptance of new concepts 
and the resulting products, and for enhancing and modifying 
those products as needed. They have been the primary link 
between those who create and those who must implement. We do 
not see a change in this role* nor do we believe that a change 
IS desirable. For this reason, it is important to involve the 
publishers early in the conceptualization of products resulting 
from sound research. 

One of the crucial concerns is the proper interpretation of 
test results. One suggestion mii ht be to provide funding for 
targeted* in-service training to teachers and administrators in 
interpreting test results to enable them to use tests better to 
improve instruction, and to convey information to students, 
parents, and the pub'ic 

State and local education education agencies might be 
required to develop a comprehensive assessment plan, which wjiuld 



ERIC 



20 



'dentify instructional and accountability goals and objectives 
and the assessment instruments that would be used to achieve 
them and .neasure progress. The plan could include specific 
programs for in-service training, public information, and for 
assurinQ that teats are selected, us3d, and interpreted 
appropr i ately . 

We do not believe that the federal government should get 
into the state and local testing business. Continued financial 
and technical support for research and development on innovative 
assessments, as provided by the Department of Education and the 
National Science Foundation, would enable continued progress 
toward improving educationa. assessments. 

I would be lemiss if I did not point out that while 
publishers are trying to respond to the need to develop 
challenging and innovative tests (parallel eftorts are being 
undertaken by publishers of textbooks and other insti actional 
mater'als), federal tax poliry is frustrating its achievement 

The Department of the Treasury is insisting that publishers 
of tests and instructional materials capitalize research and 
development and other pre-publication coats, a position that 
falls with special weight on preparation of new tests and 
instructional materials, with their high development costs, high 
risks, and long lead times. This approach is shortsighted as a 
matter of educational policy because it discourages the 
development of the innovative quality tests and textbooks our 
schools need. It is also discriminatory and unjustified tax 
policy because it requires capitalization of product development 
and research costs that, for any other industry, could be 
deducted in the year incurred. We have requested the 
tax-writing conmittees (and the Administration) to provide 
appropriate relief, but the outcome remains very uncertain. 
This Committee's assistance in assuring that tax policy does not 
frustrate education policy would be most welcome 

Thank you for your attention. I would be pleased to 
respond to any questions the Committee may have. 



1779h 



-10- 



ERIC 



24 



21 



EDuoannAL Neasuomint 



Issues AMD practke 



volume 7. number 2 
Summer 1966 



Riverside Comments on the 
Friends for Education Report 



Cdward C. Dr&^•ozal 
Aluerside Fubllshlng Company 
and 

David A. rrisble 

The Unkwlty of /ou« 



me authors, both 
a//liueed with Kluenlde 
FubiisMng Company, 
discuss the factors they 
think expUln the Uke 
Wobeston phenomenon 
and call for more 
appropriate use of nor- 
mative comparisons and 
more compiete reporting 
of test ,tsuits. 



Tht flml r«prt of rrjtndi For 
Educfttion (FFE), "N»tioii»lly 



hAMbMlldiMMMtdtB 



raport mam-uxmcy and com- 
pwtbitity cf tiormt, ctuT«ncy of 
w)riB^ tttoetrnty of pupU» urtjd 
and tMVtA m rworti to Uie 
aad tMWtatMn to tMch 
■Hki^«ducatontf« 



Um eoBtant ol tte final nport, 
diM«h«etaktaMptioBto 
oltteitataMily^mitbodi 
,y«d,Nmoltteinliivi«to- 
and toM of the eon- 



focus 

by the.-, _^ 

oon^ttoili tiM UM of 
toei teowt-wrth r- 

thoir fen n atto n and 

tetiott-to ay uUi iriai atata rmwm 

diftaent laat battarlaa. dmeat 
sIsiidBidlMiioii Minpioa aad bmUt 
0^ dUteonk yoart of norviiv. aod 
jiffavMit aeore aeaU luiitt for 

'TffSl, of itat. awljgtnct 
pgrfwmanea raport«l by FFB ap- 
bMH *o have baan ooadttctod aa 
^nhitty aa poaaiMa under tht ev- 
cmngttocaa. Tha taauaa thr* tht 



undar attWBrtabiWy pfaa«re-ar« 
notnaw WlMteottiaaaaacanttme, 
uMipictwl, diaappomtuif ihodj M 
tha aBparni univ«r«l appeal of the 



k-fc^ Ci«ipe"». W'*'!'* 
iL«MJ «» 



t0r mi Am^ 0^9001^ 9ft^f<^ 

ai.Miii4*H«a — — 



^^^aaoa^ Ummumaet^ Imm ud Pr«ctiet 



o 

ERIC 



r» — 



22 



Minplistic objfctave of bting •bove 
the nitionil avtrvfe and Uw txtent 
to irhKh ichoote ire wccmtful m 
mmifinf lofnaiww to apiMv abovt 
the natiofuJ avcrif* when faced 
-,*nth prawrei&nd «vcn ultunacuma 
from pditjciaitt. prtu, ih« courts, 
and even watchdof groupa 

Despite the shortcominss that 
can beated recarcbng the nature of 
the data that FFE analyied and re- 
ported, there la ampie evidence to 
warrant close exanuna&on of the 
group's fundamental question Why 
are so many pupils or schools or 
states appearing to perform above 
the national average^ The question 
seems as simple and straight- 
forward as the one poaed several 
yean ago Why are test scores 
declining from year to year' We 
believe that the question raised by 
FFE nvals tha score-decline ques- 
tion in significance and. as was true 
of the score-decline inquiry, this 
search for resolution is likely to 
yield multiple, concomitant explana- 
tions There is no single best 
answer A doser examination of the 
issues by FFE. the publjhers. and 
the state and dutnct test coor 
dinaton might enhance our ability 
to uae test data to further our 
primary goal-to unpnyvc the quahty 
of instruction provided in our 
schools With this purpose in mind, 
the remainder of this paper is de- 
voted to identifying what we believe 
are the most crucial issues and to 
presenting a scheme that we would 
use to compare the performance of 
state or distnct groupa with na- 
tional Dupil or school norm groups 

SosM Maior Isaaes 

Aerurncy of Sorms 

National norms for standardized 
achievement tests are based on a 
sample of pupils and schools (atten- 
dance centen) obtained through a 
complex, multistage sampling 
scheme Each publisher stnvet to 
ensure that the national population 
of pupils and schools is properly 
represented in >ts norms sample 
For example, m the standardization 
of the Iowa TesU of Basic Stalls 
(ITBS) in 1984-85, distncU were 
chosen on the baais of geographic 
r-*gion. enrollment sue, and soao- 
economic charactenstic* of the com- 
munities served The standardization 
la a jomt responablity the authors. 

Summer 



publishers, and school personnel 
Rig>d conditions f<v diitrKt par- 
tKspabon inchviad the p">viBon for 
sampling attendanoa cent, n of the 
distnct by the publisher rather than 
by the schfMl administration An 
adequate sampling plan IS necsesaxry 
but not sufficient to guarantee ade- 
quate norma Only if the plan is 
realiaed. only if the sample obtained 
reflects the sample Ocaired. will the 
norms represent national pupil or 
school achievement accurately 

To the extent that any pubhaher's 
norms misrepresent the national 
distribution of piqiil and school 
achievement, comparisons wifh 
either of these norm groupa will dis- 
tort the estimatsd achMvemcnt levd 
of the group in question Anunder- 
representadon of high-achieving 
schools or high-achieving pupils will 
cause the national norms to be 
"softer" than they ought to be 
That IS, an average-achievvig pupil 
wUl appear to be above average 
when referenced to a group wnoae 
average is bekm their theoretical or 
"true" average 

The sampling plan, nature of the 
obtained sample, and weighting 
schemes uaed m the standardization 
of each achievement battery tn ques- 
tion should be examined to deter- 
mine the reprssentativeneaa of the 
published norma This should be 
done separmtHv for pupil and school 
norms 

R$etyuit of A/orms 

It IS a well -documented TaC thfit 
achievement ui gradoa 3-8 has been 
ruing steadily since the late 1970s 
Though the yeartu^ ^"^ar (hffmnoea 
might be regarded as minor ( 3 of 
a gradecauivalent month, on the 
average), tn* oumilative effect over 
10 years IS Bgnificant (spproximate- 
ly 3 months, on the average) Ob- 
vioualy, thoae who compare dte 1967 
performance of their pupUs with 
that of other pupda who were tested 
ui 1978 (national standardization) 
will be using "softer" norms and 
will have more pupils appearing to 
be above the national average than 
really are 

We have published uiformation on 
changes ui student p«rf(^Tnanee for 
the past 30 years DaU for 1955 to 
1984 are summarized on pages 
148-153 of the new ITBS Manual 
jor SciuMl AdmtnxMtralon (Hierony 



mus k Hoover. 1986, Differences 
m performance vary by test, grade 
and score level The 1977-85 com 
poMte score differences are eight to 
nine percentile ranlu (PRs) at the 
median in moei grai^. but dif 
ferenccs m language exceed 10 PRj 
m several grades at several scx>re 
levels 

In periods of fluctuating achieve- 
ment levels, the recency of the 
norms is a critical issue When 
achievement levels are relatively 
stable over time, as they have tended 
to be at the grade K-2 levels, "old" 
nom<s do not interfere with score 
interpretabona. assuming that we 
have cumculum stability as well 

Satur§ of Ttttmi PopuioXwn 

If we have good reason to believe 
that pupils in a given state should 
have scores, on the average, below 
the national average, we must be 
certain to define the population for 
which we expect the prediction to 
hold There are several related 
issues regarding this point with 
respect to the FFE data. IfSute X 
reports a msan normal cxirve equiva 
tent (NCK) for 45,000 four^ 
ers. we should ask these que! . 
How many fourth t^raden were 
tested but not 'nciuded in the com 
putation of the mean, and what is 
the nature of the scores of those 
who were excluded from reporting' 
How many fourth graders are there 
in State X who were not tested and, 
o onaequently, who were not included 
m the reported scores^ And what 
are the achievement leveU like for 
theae students who were not tested'' 
Baaed on the Department of 
Education s Center for Education 
Statistics fall 1985 enrollments pn^ 
jected to 1986. the percentage of 
studeitta for whom scores are re 
ported in the FFE report vanes 
from a low of about 85% to more 
than 96% of total grade enrollments 
for mo$t statea for which full grade 
testing was reportedly done (For 
<me state with public school enroU- 
menU of about 48.000 students per 
grade, averagea and PRs are re- 
ported for approximately 37.500 
studenU. whiii is about 80% of the 
total enrollment ) The discrepancy 
between the reported state scores 
and the expecUQora m the FFE re- 
port may be in part due to such «lif 
ferences between t«sted and total 
enrolled populations of students and 



ERIC 



23 



sptcifieiily to (he nature of the por 
tion of the student population not 
tested. 

Adtqwaqf of Egp9ctatvm$ 

Educatot have developed some 
expecUbona about how pupils and 
groups of pupils might perform on 
achievement testa based on their 
study of the rriationship of school 
achievement to other social, pohti- 
cal. and ecoviomic variables This is 
why we uac such vanables as en 
rollment sue and socraeconomic 
status for stratified samphni m 
standardizations FFE haa used 
some of these relationshipa m at- 
tempting to develop expectadona 
for stAie level and schoolHiiatnct- 
level performance Per-capita in- 
come, graduation ratf md college 
entrance score averages are among 
the "standard baromelera of excel- 
lence" employed by FFE Though 
we do not deny the value of these 
indicators aa part of the prediction 
equation, we realia* ^^at it la not 
poaaable to predict »cmevement in 
this way wth high accuracy Forex- 
ample. the achievement teet perfor- 
mance of Iowa pupUa is among the 
very highest in the nation, yet these 
fad* about educational conditiona 
in Iowa seem mconsatent with that 
high level lo'va ranks 27th among 
sutea in per-pupii expenditure. 
39th in average teacher salary, and 
44th in spending increaae from 
FY86 to FY87 

In view of the leae-than* perfect 
relationships between achieirement 
and these other vanablaa, the praa- 
SKX) of whatever expectatWM about 
achievement we majr formulate 
should be tempered That is, what 
we are able to uy with raaaonable 
aa"urance about how imny pupila or 
schools should leore above a apeofic 
point (the mean tn the national 
norma diatnbutKm) la not very 
useful Conaamently. we night in- 
stead settle for atatemenu like 
these for Sute X "About 40^ of 
the fUth graders tested ihouki aoore 
between the 25th and the 7Sth per 
centiles on national pupil norma." 
or "About 49-55% of the third 
graders teated should score above 
the national pupU median (SOth 
percentile) " Of course, the ability 
to make such statements dependa 
on a far greater understanding of 
the sutistical relationship between 
those variables than moat sutcs 

14 



probably have been sble to 
determine 

Tioekxng tkt Tnt 

Pupils and their teachers who 
partiapate in the standardisation of 
an achievement battery have not 
had an opportunity to see or study 
the specific teat questions used 
Thus, havmg no practice on the spe- 
cific teat quettiona la one of the 
stringent preconditiona of the atan- 
dardization process Subaequently, 
when these norms are used to inter 
pret the scores of pupils who have 
been drilled with the exact test 
questions, the rMult la an over- 
i^reaentation of the amount of 
knowledge and sbU poaaeaaed by 
such pupUa LUcewiae. when the 
scope of the curriculum is narrowed 
to encompaaa primarily the objec- 
tives measured by the exact teat 
questiona. the relative standings of 
the pupda who expenerveci the re- 
strictive program of study wall be 
overestimated. 

No pubhaher condones this use of 
tesu. and few teachers probably 
follow such abominable pncticea 
Those who do are nearly always 
motivated by significant negative 
consequences aaaociated with 
scores that might turn out to be 
below expecution (not alway? 
synonymous with natwnsl average). 
Unfortunately, for some educatora, 
job retentxm and aalary incraaaea 
are tied directly to the test acorea 
of thetr pupils The authors of the 
(TBS have always dacned the uae 
of achievement aooraa for such pur- 
poaca and instead have campaigned 
for the uae of these aooras to un- 
prove instruction directly 

If certain tasta are to be uaad 
stnctly for acoountability purposaa. 
their security must be ensured so 
that the scores that result wiU be 
valid for that purpose The dollars 
required to aaaure atates and dia- 
tncta that the test forms they will 
uae are secure wouM be far graater 
than the value of the information 
denv^ from using the secure 
forma Thoae dollars wouM likely 
have greater and more viaible tm 
pacta on learning if devoted to 
direct inatruction inatead 

Score Ainl^n» and Inlrqmiatxon 
With which norm group, pupila or 
schools (attendance centersK shoukl 
averages from State X be compared 



to interpret the scores of pupils 
from that state^ With which norm 
group. piuHla or schools, should 
averagea mm Dutnct A be com 
pared? There are only two choices 
pupils and schools, because no pub- 
lisher provides norma for school dis 
tncta or for states Thir i a funda 
mental isaue currently facing the 
Council of Chief Suu School Of 
ficers aa they contemplate options 
for providing for state- by state 
achievement comparisons in the 
future The choice to be made is not 
a matter of personal preference but 
a matter of the logical correspon- 
dence between the units to be com 
pared That la. averagea of school 
buildings should not be referenced 
to a diatnbution of individual pupil 
scores, district averages should not 
be referenced to the distributions of 
either school buikling averages or 
pupil scores, and state averages 
shouM not be referenced to any of 
ther^ three distnbub(His In view of 
the differences between these sep- 
arate distributions, it la moat logiad 
to reference a score or average 
score to its own kind When the 
moet logical referencing is not 
poesible. appropriate caution should 
be exercised 

The national pupil norm group m 
eludes pupUs whose scores on a test 
are u high u perfect (PR - 99) to 
thoet! whoM soorea aie u low u srro 
or chance average (PR - 1) No 
school (building ^r attendance 
center) la likely to nave an average 
score that la perfect or zero In fact, 
on the ITBS andany oth^ ' test with 
reosnt achoot norms it ui reasonable 
to expect that no school will have a 
raw or scale score average higher 
than PR 88 or lower than PR 12 
compared to the pupil diatnbution 
Because many school districts are 
single-grade- within-singte>buildi n g 
entitiea. the diatnbution of school 
diatTKt averagea probably would 
encompau the same range aa the 
distribution of school building aver 
agea The school district distnbu 
tion. however, is likely to be mar 
kedly more leptokurtic and less 
variable than the school average 
diatnbution In terms of the pupil 
distnbution. the distnbution "f 
district averages might range, tjftc 
fitvly between PR 75 and PR 25 
Finally, mo$t of the sute averages 
un a test for a pven grade might 
well have actual bo'inds that corre 



EducationAJ Measurement is«^ s and Prictuv 



ERIC 



fit* 



24 



TAftLB 1 

Percentages of StMte X FupHs Ferforminq 
mthin Selected FiiUontt Fupil FercenMe Intervais 



p«rcent«ie 


National 








Grade 








Average 


rank 


percentaji* 


K 


1 


2 


)_ 


4 


_5_ 






■1 


K^0 




10 


20 


21 


24 


20 


21 


20 


20 


21 


19 


20 7 


'5-89 


15 


:2 


24 


23 


24 


23 


25 


24 


24 


23 


23 5 


50-74 


25 


26 


29 


25 


27 


28 


20 


29 


20 


20 


27 4 


25-49 


25 


19 


10 


10 


17 


16 


17 


17 


17 


19 


176 


10-24 


15 


0 


6 


7 


7 


0 


6 


7 


6 


7 


69 


1-9 


10 


5 


2 




4 


3 


3 


3 


4 


4 


34 
























nilionil mediin 


M 


74 


72 


71 


72 


73 


73 


73 


70 


72 


Percentile belovv 






















n4t<onil m«di«n 


)2 


26 


20 


29 


20 


27 


27 


27 


30 


20 



spond to PR 60 and PR 40 on the 
pupU dutnbution 

BecauM norma for dutnct aver 
aces or for s^U averifet are not 
available, distncts and tUtes often 
use the pupU and school norms that 
do exist. Whe \ a diitnct averaft is 
referred to \he pupil norms, it 
should be ihoufht of u these '^ob* 
uined by the averafe pupii i the 
diatnct We mifht find, for exam- 
pie that the averife pupil in Dts 
tnct A scared higher than 63<^ of 
pupils nationally Csinf the same 
rationale and the estimate ftven 
above the average pu{Ml m moat 
states IS not likely to exce«d PR 60 
or fall below PR 40 The value of 
auch information is highly quei 
tionable 

A matter related to this general 
is.'^ue of analysis concerns the 
methods of computationai precision 
used to aggregate and convert 
scores As an example of the prob- 
lem a grade 4 school average GE 
composite score of 42 0 (obtauMd in 
the fall) on the tTBS has a PR o( 46. 
and a score of 43 0 has a PR of 53 
By interpolation and rounding, an 
average GE of 42 S corrtsponds to 
a PR of 49 S or :>0 If GCs are 
rounded before converting to PRs. 
a 42 5 could be treated aa a PR of 
46 or S3, depending on the rounding 
convention adopted Of coune. this 
lUustraUon underplays the magni- 
tude of the distortion that could 
result with distnbutiona of either 
« oi distnct or sute averages 

^ugh It IS in the beat interest 
h publishers and test users to 
.irr tests and scores used proper^, 
neither can ensure that the other 
wiil do Its part willingly and i!r.- 
selfishly Publishers must be 
tountcd on to standardiat and 
analyze results in profaaaonally ac- 
ceptable manners Tliey muat guard 
against potential misuse by inform^ 
ing educators of the intended uaea 
of the tests they publish and warn 
against the poaaible miauaes that 
might be antKipated Pi^iahers 
must do their utmost to provide test 
matenals only to those who are at 
least minimally qualified to handle 
the tests and scores in a profea* 
sional way Sut* directors, super- 
intendents, teachers, school boards, 
and the public, generally, do not 
have the resources to monitor the 

Summer tS8« 



effectiveneu of publishers in at 
tending to these obligations 

Publishers, on the other hand, 
cannot nMHutor the use of their in 
stniments effectively to curtail 
nusapplieation. misuae. or inismter- 
pretation Often after the fact, a 
publisher can recognise inappro- 
pnioe ua*— whether intentmial or 
unintentional— and attempt to per- 
suade the user to modUV a prqpoaal 
or report Some sdiod ONtncta per 
form cxtciNive audits to eniure that 
studenu who were to be tested 
in each attendanee osntar wm actu- 
ally tr««d Some dialncta also audit 
reauiu an^ reCaat luipact sTOupa. 
But for the moat part, pubbahers 
are not aware of and have no con- 
trol over school dutncts' taat ad- 
ministratioii coaditioQa, the stu* 
dents ladudad in summary data 
'eportad tothe pubbc, or methods 
uaad to qmtheane data to make taat 
reauita more palatable for leaa 
sophistKated conaumcrs 

Most teat autbon and publishers 
go well out of their way to comply 
with the standards for educational 
and psychologKal teata adoptad by 
the profaaawn Teat acore usen— 
teacnerv, admmistraton. legiala- 
tf»i. and other pubbc groupa^tend 
to know far leaa than they should 
about the nature of taata or the prin- 
ciples with which teat makers in- 
tend for scwaa to be used We 
shoukl not denounce a test because 
a sUU committee uaes the wrong 
norms or incorrect statistical 
analysis procedures in reporting 



Likewise, we should not blame 
users for results baaed on shoddy 
standardisation procedurva or on in 
adequate or deceptive deacnptuma 
of such procedures 

Finally, pubkahers are obligated 
to clienta to maintam the confiden- 
tiality of teat data. It has been and 
should contmue to be each client's 
decision to r e lease test daU and to 
determuM the nature of any data to 
be releaaed Reporters, citixens 
atisens' groups, and others who 
wiah to obtain teat data should re- 
spect thu publishercbent relation- 
ship and seek rslaaae from the school 
diftnct or state, depending on their 
level of interest and the dictates of 
state law 

A Sanpla Bapartia« MetM 

We recommend an approach like 
the one daacnbed below for states 
that wiah to daacnbe the achieve- 
m«it levels of their pupda in rela- 
tion to pupda in a nationaUy repre- 
lentative norm giwm Exaietlythe 
same prooaduraa eouM be uaad with 
school buildiBf (attendance center) 
data. Table 1 skowa national PR 
rangta tn the first oohunn and the 
eorrespondiBC pereentagts in the 
second cohuuL The body of the 
table shows, sapan^y for each 
grade. Lie percentage of pupds ui 
State X that obtained nattooal PRa 
in each rang* The last column 
shows the row averages of the Per- 
centage values (Note that these are 
percentages and not peroenole ranks 
and. consequently, it is acceptable 
IS 



ERIC 



25 



to Avtnc* tlMin ) The bottom two 
rows ladicilt, atini by gradt, tlM 
pvtjinliii of pi^di iliuvt md bikw 
tht natioml modiM. A biilofrMB 
with oot diitrtwtiow wiptniinniofd 
on Um othtr or * iMt bar graph 
wouM providt «bth»il viNil dis- 
pltT of tht taiM mrarmafcion. 

tnt fiiMi idvinliigi of thv Rwthod 
of rtpoftinc oomptrtd with rtport- 
ing nmply tht ptretntaft tconng 
abov* tht im&omI nadiu ifobvioia. 
BtCwttn-gridt difftmiott tixt mft' 
UantMt etn b« txanuntd, but mott 
important, ducrepanatt from tht 
national diithbution can bt ac- 
counted for in tach of atvtral seg- 
mtntt of tht dutnbution. If ail w« 
know tathat 72% an abovt the na- 
tional median. w« do not know if tht 
"eitra" 22% are mottly k)eatad 
very near tht median, moatly 
spread through tht upptr half, or 
mottly conctntratcd m the tail. 



Alto, wt do not know if tht txtni 
22% an ihifttd from tht kvwer tail, 
fhm throughout tht kmtr half , or 
from Juat btlow tht madian. 

Many diatrfets uit a rtportaag 
proetdrart lunilar to tht rtporting 
ichamt dMcnbad above. Wt rteom- 
mtnd that aiieh tabular data bt 
supplemented with at ieaat thr 
following aoru of information 
teatiag date, tatt form and leveKa) 
used, type and date of the norm 
uaed, and percentage of eligible 
studentw tested. 

Rivemcie Pubhahing Company 
tnd lU repreaentatnrea do not 
believe that the average pupil ui 
every atate baa aeoraa above tht ntr 
tional mtkhan on the ITBS We are 
confident m our ttandardiaation 
prooadum and have Mbyacted thoat 
procedurea to pubbc acrutmy ui 
detad ui theMamal/lofrS€koolAd- 
mxnxMtnton. We have updated our 



norma at leaat every 7 years and. 
when achievement showed a pat- 
tern of incrtaae ui the early 1980b, 
new noma were obtauied even 
though otw taat forma wert not ui- 
trodiaoad. We are making plana to 
provide annual national norma up- 
dattt for ntxt ytar. Our m'^nualt 
cautioa uaert rinut appropriate uae 
of norm groupa for varying pur- 
poeea. Our hope is that lie laaues 
raised above will cause FFC and 
state and diatnct test coordinators 
to raaaaaaa than* analysa and report 
ing proctdurta to cnsurt that con- 
ciuaKHia readied are based on a 
valid foundation rather than data of 
queabonatiieongm and manipulation 

Referewcee 

Hiwonymui. A N . 4 Hoover. H 0 
(1984) tfanMl /or school ad 
mtn%atrat€n /owa TnU aj Batu 
SkiiU Chieaao Riventdf 



A Response to John J. Cannell 



Joanne M. Lcnke and John M, Kcene 
The Psychological Corporation 



Two rtprtsenUttves of 
The PsifChologlcat Cor 
poratlon present f/ieir 
reactions to the Cannett 
report and call for rtter 
explanations for the 
pubiic of the meaning 
and iimits of norm- 
referenced scores. 



In recent yeaia. pubbc attention has 
focused on st a n da r diitd achieve- 
ment tatt rttulta. Ttat rttulta, 
which art intended to deecnbe the 
pcrformsDce of mdmduala ui rela- 
tion to one another, are now often 
uaad to daacruM the ptrformanot of 
grouptof studtnte In a report tn- 
titled ''NationaUy Nonned Elemen- 
tary Achievtmtnt Tsiling m Aaier- 
tca'a Pubbc Schoob: How All 50 
Statea Art Abofvt tht Nadonal Avar 
an." John Jacob CanntU atttmpu 
to cast doubt on the validity of the 
information being reported to dt> 
scnbt tht achicvtmant of studtnte 
as aggrsgaiad at the state and/or 
district level. Tht report atatta. 
"Theae taau allow all the stattt to 
daun to bt abovt tht national avtr- 
age' Tht teste aUow90%ofthe 
school distncto ui the United Statea 
to claim to be above average More 
than 70% of the studenU tested na- 
tionwide are told they are perform- 



ing above the national average " 
in response to Cannell. >i is fair 
to say that many states and school 
distncU report above-average per 
formanck ui reading, mathemabcs. 
and/or Isf^uafe in the elementary 
gradea. We do not believe that this 
is an attempt to misrepresent stu- 
4cnte' achievement in the nation's 
schools Let us examine three very 



Joemw M Unkt %a Vxa prttxdent 
MmtwmtiU. at Tki l*wyekoiog»aU Cor 
pomtym.su ActdmieCamrt San An 
(OHM. TX MMi tiM Skt >p$e%ai*tm %n 
tmt dtMitfrntnt ami nomitnf . Kolxn^. 
atU tr^atiMt tmU. 

JotmM XMWiiDw«*r Admumimat 



ponttym. SU Aeadtmu Conn San An 
Umw TX 7tru tiM He tpteuUxtn i- 
tdueotumai wmturtmtnt 



Eduabonal MMSurcmcnt luueii uvd Prmcttce 



29 



26 



n^p.^rt<tni -NNueN relate<i to ihe in 
ferpretalion .-f this mformatmr 
la) ifToup p<?rfArmance reW' to i 
natK^nal norm ibMocal p^rfor 
mancp r*-lati\e to natinnaJ pf r* r 
manc# .ind (c) the stability 
unie\ement lest norm^ over tim^ 

InterpretiBf Group Performance 
Relative to a National Norm 

Uh^-n a te<l is 5t-in<lanii7ed or 
normt^i the test s tvpualU admin 
stered to huniin»d:j of thousands of 
Ntudents nationwide This norrr.intf 
>ample dravm to reflect specified 
demopTiphic oharaclenslic* of ihil 
iren attending school m the I nited 
States Such dem»»graphic uhar 
.utenstK^ 'nilude uneconomic 
-tatus t^lhnii il> rejpon of the loun 
tr\ and ^ire of sch<)ol dislnct 
IVroeriile ranks an? tnen denved 
trnm fre<^uencv liistnbulions i^f m 
<ii\.duaJ student* score at each 
jfrade Norms prov ide a mechanism 
for desi-nbinji a sludi.nl s perfor 
mam e relativ e to that of I'her stu 
ient> in the ^ame gradr "^im 3tn>s.s 
the * ountrv at a particular ,'Mint in 
Time 

T^e u^e of uSei>e norms to descnbe 
jff'Up ^)er*'<rmance must be inter 
preted cart-f For exam^'e f a 
state s avera^ce score -n reading is 
at the '>4th percentile thepn>perin 
vrpretatior of this ojv is that the 

i^erage or t\i>icaJ -ttudrnt in the 
state jierfftrmed better t^.an S4'^ of 
•ht norming sample It is not ap 
f r-'pn'e to conclude that all stu 
dent>. in the state are abov e avera^ 
in reading that the state cjs a v hole 

s aUn e a\ erage m reading relative 
to ,ither states .,r that the «tate as 

I Ab<>le IS above average m reading 
relative t<i the national norm 

The approach used b> some states 
intl M^hool distncts m the reporting 

"f in-oup perfomance is to report 
the pnn entagesof students sconng, 
sa\ at or above the 50th percen 
tile or in the average and above 
average range " Although this 
method of reporting is apprr^pnate 
f>ecauie it maintains the relation 
ship between indnidual perfor 
".ance and the national norms the 
reported percentages shculd be ac 

ompanied by corresponding per 
(.enlages fo^ the national norming 
sample \lthough it isobvnouslv the 
< ase that ^0% of the national sam 
pie t'f students scored at t r above 
t>ie national median at the t.nr.e t^e 



test *as standardized il nav not be 
the case that 70% of the -national 
sample scored at or above the na 
tional mean raw vore or national 
mean scaled score If the reporting 
melnc ,s something other than the 
j»ercentag* of students sconng at or 
iib«^ve the national median the ap 
pmpnate national comparuon should 
be provide»i so that proper infer 
ences about the data can be made 

Interpreting Local Performance 
Relative to National Performance 

It IS unlikely that the demographic 
characteristics of the itudents in 

itiy sute or school <iistrct mim>r 
those k'f the nation ^ a whole it is 
equallv aniikeiy that the cumculum 
of anv sUte or local district is as 
diverse as that of the nation as a 
i*.iole Furthermore, it is not neces- 
sanlv the case that the guidehnes 
set forth bv the test publisher v^nlh 
regard to the testing of handi 
capped or limited English proficient 
students in a norming program are 
the -iame a.s those used m actual 
practice If there wer* a state or 

iistnct *hoseuemographic charac 
tenstics matched thos«f of the na 
lion vkhojK* curriculum vtas as 
diverse as that of the nation as a 
A-hole. and whose administration 
guidelines and procedures v^ere 
consistent »nth those u^d bv the 
publisher for the norming sample 
line would expect the average stu 
dent in the grf)Up to score at about 
the M)th percentile To the extent 
that differences exist *e must re 
mmd ourselves that vthen local 
group summao scores are inter 
preted in reference to a national 
norm the interpretation has to be 
placed in the proper context, simply 
that of the group's average student 
relative to the national norm 
Because it is unhkelv that the 
students tested m any given slate 
or distnct tvpical of the nation 
in all respects it would be unreason 
able to expect any yn^p to be of the 
national average 

Test purchasers, distncts as well 
as state agencirs often select testa 
through a process that examines the 
match between the test content and 
the local cumculum In manv cases 
the selected test is the one that best 
reflect* the local curnculum Test 
users selecting tests on this basis 
mav have an advar*age over the 
norm group because the test is likely 



to be more valid for assessing f)er 
formance in the local ourroulum 
than It IS for assessing the pertor 
mance of a national sample of stu 
dents being exposed to different 
ournculums. presumably having 
somewhat different emphases 

The Stability of Achievement 
Teit Norma Over Time 

Cannell s report suggests that the 
use of "old ' norms is partially 
responsible for high achievement 
test scores Presently test publish 
^•rs produce new editions (>f their 
testi. ^n a 7 lo-y >ear cycle and cur 
rent norms are provided v^nth each 
new edition Because test adoption 
rviles do not n^cessanlv coincide 
with lest revision cycles it is con 
ceivable that the norms for a newK 
adopted test ma> be 2 or more vea« 
old Therefore, it iscnlicallv impor 
t-uil that empincal norming date** 
accompany the reporting of achieve- 
ment test results 

It IS V er> encouraging to note that 
lodaj s students are pertorming 
better than their counterparts did 
m the late lyTOs and eartv PHOs 
Evidence of this improvement m 
performance can be found n(^l onlv 
from research that test publishers 
have conducted in equating nevklv 
published tests to previous editions 
but also from a recent res»earch 
study conducted bj The Fsvv hoiog 
ical 1 orpi>ration with the current 
edition of the .:^{<in/imi Athier'^rnfnt 
Tf9t Senes First sUndardized in 
the 1981 -82 schwil year the Stan 
ford Senes was administered to a 
nationally representative sample ot 
350 000 students in spnng and fall 
1986 The sample was further strati 
fied according to user vid 'non 
user groups where userb vkere 
defined as school distncts that hati 
been using the Stanford m one or 
more grades for at least one vear in 
their distnctwide or statev^-Me asaess 
ments The results of this ^tudv re 
vemled that "users" 'Outperformed 
noFiUsers.' and. more importantlv 
that nonusers" performed better 
than the onginal norming sample m 
mathematics, reading and the Ian 
guage arts in the etementar> grades 
Two important generalizations can 
be made from this research First 
test scores do tend to increase vkhen 
the same test senes is di^ed lear 
aftervear However this shoulii riot 
necescanl) be attributed ^o teath 



ERIC 



30 



27 



ing to tht tcft", rmcher. th« tMt 
rtwiti provMk « BMwM foeu* on 
anas in M«d of improvtment Sec- 
ond, •duevbonal ■durrtintnt did 
improv* from 1982 to IMS m K>ir« 
subject tma in tht t Itmentary 
xhooi gndm. Ilitrtfore, mort cur- 
rent nonni for tht Stanford Stnca 
havf bttn dtvtlopcd and are avail- 
ablt to uatra of tht battery 

Ounnc thia tune of educational 
improvemtnt. u is important not to 
loit iifht of the fact that uae of the 
same norma a penod of years 
cnabMa the teat user to demonatrau 
improvement relative to a constant 
reference group Even if it were 



cconoimcaUy feasible for test pub- 
lishers to produce representative 
national norms more often, fre- 
quently updated norms represent a 
"mowif tanet," vhsn ethxational 
gains (or losses) would be masked 
by the relative nature of the infor- 
mation The level of achievement of 
studentt in the United States has m- 
creascd in recent years, and educa- 
tors must have the opportunity to 
demonstrste these gams m order to 
ensure the necessary support of the 
local community in improving the 
quakty of education Theeducauon 
of young people must continue to 
improve, and norm -referenced 



achievement tests are useful tools 
m this endeavor 

CoaelaaioB 

Because the public is expecting 
norm-referenced scores to repre- 
sent standarda of performance we 
as educators, must aasist uie oublic 
in becoming better informed about 
the interpretation of test results 
National normative oata provide ex 
tremely important information for 
making sound educational decisions 
The degree to which these deasions 
are defensible depends on a clear 
understanding of the strengths and 
limitations of the data 



The Time-Bound Nature of 
Norms: Understandings and 
Misunderstandings 



Paul L. Williams 
CTB/McQraw-tim 



Presenting a view from 
CTD/McOr9w HlU. the 
author discusses the 
time-bound nature of test 
norms and argues that 
the phenomenon of most 
elementary students' 
scoring above averages 
from previous years' 
norms Is a result of 
generally increasing 
levels of achievements 



Recent mterest in the topK of the 
tune-bound nature of norrnsd scores 
has resulted, in part, from alien- 
tions made in a report laauad by tne 
Fnends for Education Tlie key de- 
ment the argument put forth in 
the Fneoda for Education report is 
that too many itudenta appear to 
exceed the national average DaU 
have been presented m the report 
which are said to show that more 
states and school distncta are scor- 
ing above average than one might 
initially expect 

It IS an interesting phenomenon 
that It IS through the vehicle of the 
Fnends for Education repMt that 
the tune-bound nature of norms has 
received some measure of public at- 
tention. The tMt that norms have 
always been referenced to the year 
of test standardisation is aomething 
that has been so universcUy known 
and understood by testing profes- 
sionals that It haa not had a large 
measure of attention focused on it 



Perhaps that will prove to be an im 
portant suigular contnbuann of this 
issue of Edtuatxofuii Measuremmt 
I$nM and Fractic* 

Tke Cyclical Natare of Test and 
Noraa Devetppaeat 

The evolution of norm referenced 
teats (NRTs) aa valuable assessment 
instruments haa been charaoerued 
by the expansKwi of the purposes for 
testing In the earlier versions of 
NRTs (in the mid-1960s to the 
mid 1970s), the pnmary purpose 
was to provide accurate normative 
scores so that group and individual 
companaons could be made to a na 
tional profile of achievement Using 
thu information, school adminis- 



/?«MorcA and Mtaturrmgnt i( <T8 
McGraw-HilitSOOGarrifnRoiui M'^ 
(«rvy CA 9SH0 Httptruuues tn fduca 
tionai tmtxng and mtarurtmnt 



18 Educational Ueasuremcnt Issues and Prmctae 



ox 



28 



treton couJd evaluate progrmm 
matic and individual strengths and 
^.eaknesses so that appropnate in 
stnictional intervenOon and re- 
source allocation could be applied 
Additionally using multiple-year 
testing longitudinal trends m 
achievement could be monitored 

An expansion «>f these purposes 
took place with the publication uf 
the Oihtnmia Arhtrirmrnt Tnt 
iCAT) F''>rmg r >iTui [) (CTB/ 
McGra* HiH 1977) This test bat 
lerv for the first time allowed 
NOtres for instructional objecti\ es to 
he reported from an NRT for in 
dividual examinees Although 
earlier NRT test versions d*d allow 
test administrators to use item 
anal>ses for minimal diagnosDc pur 
poses CAT C and D provided spe 
cific instructional objective scores 
for the purpose of more individual 
ixed instructional planning 

The schedule for the publication 
of norm referenced testa haa fol- 
lowea a baaic. industrywide cycle of 
bet ween 5 and 8 years for the same 
test seres In the instance where a 
test company has more than one 
NRT senes. such as CAT and the 
f''i/mprfKr>istvt Tf$t o/Basxc Slnili 
(CTBS) publication is staggered so 
that one test of the senes is pub- 
lished about every 3 or 4 years 

This cycle haa been dictated by 
$e\eral factors The first factor has 
been the speed with which cumc- 
ular changes take place in the na 
tion s schools XRTs are designed to 
reflect tht predominant achieve- 
ment ou..comes and currtcuiar 
trends in the nation's schooU When 
a new form of an NRT is developed, 
i-ontent considerations are of para- 
mount importance AJthoufh cur- 
ncuiar trends have a major impact 
on the content of NRTs. these 
trends do not change so faat m the 
schools that nnore frequent revi- 
sions of a test senes would be 
justified based solely on them 

At the time an NRT is revised, 
the collecOon of data for the genera 
tion of new national norms takes 
place Using a national probability 
sample, dau are collected for 
several hundred carefully selected 
school districts and hundreds of 
thousands of students Baaed on 
this carefully selected straofied 
^Afnple. normati\ e scores are devel 
v)ped 

Each of the den\ed scores that 
'>unim«r 1988 



emerge from the standardization 
process, including percentile ranks, 
grade equivalents, and normal 
curve equivalents (NCEs). has a 
predefined relationship to the 
charactenstics of the norm group 
Thus at the time the test is normed 
^0% of the examinees will exceed 
the 50th percentile and the same 
percentage will fall below the 50th 
percentile Denved score tables for 
the test battery are produced, and 
ail sconng of student tests is refer 
enced to these ubles until the bat 
tery is either r^vis^d or in rare 
instances when u is renormed with 
no change in the content of the test 

Data from national probability 
samples are not usually collected for 
a t( St more often than every S to 8 
years because it is impractical and 
economically infeasible to do It 
would not be reasonable U> ask or 
expect schools to administer tests to 
large numbers of students every 
school year in order to develop year 
ly norms based on a national prob- 
ability sample The coat of such 
tesan^ would have to be passed on 
by the publisher to the schools and 
would add substantially to the cost 
of school tesang programs 

In summary moat large test 
publishers follow the common and 
decadesold industry practice of 
revising and standardizing their 
achievement te'^s about every 8 
years The content is updated to 
reflect current cumcuia and in 
stnictional practices, and new 
non -are developed so that the test 
reflects levels of achievement that 
prevail during the school year in 
which the teat is standardued The 
dates of standardization aie given 
wide pubbaty, and all purchasers of 
the test are aware of these dates 

Proper iDterprctatioM of 
NatioMi Nonu 

Because n^/rm referenced tests 
are not normed yearly on a national 
probabUity sample, changes in na- 
Oonal achievement between the 
nornung years will be reflected in 
the norm scores for groups of 
students For example, if nabonal 
achievement levels decrease be- 
tween nonning" as they did from 
the late ld60s to the mid-1970s, 
students' norm referenced scores 
will decrease and more s^Jdent« 
will fall beluw the mediati CiJth 



percentile) score established when 
the test was normed On ihe other 
hand, when naoonal achievement 
levels increase between normings 
more students will exceed the me- 
dian established when the test was 
onginally normed Regardless of 
the direction of naOonal achie\e 
ment trends, when a test is re- 
normed. exactly half of the students 
will fall above and half «ill fall 
below the newly established 
median 

At this time, national achieve 
ment indicators all point to the fact 
that student achievement is gen 
erally on the increase This increase 
IS documented by the National 
Assessment of Educational Prog 
ress (NAEP). the Scholastic Ap- 
titude Test (SaT) results, two 
Congressional Budget Office 
reports (19S6. 1987), and data col 
lected dunng recent test normings 
by CTBm-Graw-Hill (1985. 1987 
1988) 

Thus, dunng a tune of increasing 
national achievement, the students' 
normed test scores will nse be- 
tween norming penods More stu 
uents will score above the median 
score established dunng norming 
than will fall below it This confirms 
the sensiDvity of the test norms u> 
changes in achievement one of the 
tests' pnmary functions These 
normeo test scores are valid 
measures of student growth 
Although the reference year for the 
scores will be pnor to the year tn 
which the test scorts are reported 
the test scores provide accurate 
program and student information 
The fact that the norm scores 
themaelves refer to norming that 
took place dunng an earlier year in 
no way compromises the major pur- 
poses for administenng an NRT or 
the uaefulnesa of the scores for pro- 
gram evaluation, student instruc 
tional planning, or the monitonng 
of longitudinal trends Whtn inter 
preung the scores, the test user 
must sunply be aware of the year 
that the tests were normed and the 
general direction of national 
achievement trends Irierpretive 
guidelines are found in relevant 
test related matenals produced by 
most publishers 

The Fnends for Educaa<^r> report 
has received attention pnti>^i> as 
a result of Its improper interpreta 
tions of score distnbutions for 



ERIC 



xJ C 



29 



norm referenced tests between 
renorming years The sensationii, 
and apparently illogical, phenome- 
non of having too many students 
above the national average is the 
basis for the cntiasm leveled at the 
testing community by the report 
This IS a point that should be 
elaborated upon, because it may be 
misunderstood by others as well 

A naive nterpretatior of what an 
average (mean) represents is that 
half of the scores in a distnbution 
will fall above and half will fall 
below the average Although this is 
a common interpretation, it 15 not 
statistically correct The report suf- 
fers from this nusunderstaiiding. as 
illustrated by the following quote 
"Standard principles of mathemat- 
ics make it difficult for more than 
one half of any group to be above 
average" (Canneil. 1987) There is 
no mathematical pnnciple that 
would cause this to be so Depend- 
ing upon the shape of the distnbu- 
tion of scores and the measure of 
central tendency that is selected to 
describe .he scores, more or fewer 
than half the scores may be above or 
below the measure of '•entral 
tendency For example, the mean, 
or arithmetic average, does not 
necessarily split a distribution of 
scores into equal halves An aver 
age that splits the d^bution even- 
ly will occur only in a symmetncal 
distnbution If the distnbution is 
skewed, there may be many more 
scores above or below the average 
depending upon whether the dis- 
tnbution IS negauvely or positively 
skewed The median (the SOth per- 
centile), on the other hand, does 
separate a score distnbution into 
equal halves Thus, there is no a 
pnon reason <x> believe that norm- 
referenced scores sKouid separate 
the examinees into two equal 
halves, particularly dunng times of 
changes in national achievement 
trends 

Extended Extrapolatioaa 

The time-bound nature of nor 
mative interpretations is relatively 
straightforward to descnbe and 
understand What becomes more 
difficult to evaluate are the social 
and educational implications that 
might he drav^Ti from acknowledji 
ing thai 'ctual score distnbutions 
may difler increasingly from the 
published norms as a result of 

20 



changes tn achievement over time 

One way to determine the amount 
of change in achievement over time 
might be to survey states and school 
distncts and. based on the aggrega 
tion of scores, determ.ne the 
number of states and distncts 
reporting above "average" (50th 
percentile) scores Additionally, it 
might be possible to determine the 
proportion of students above the 
50th percentile and the avenge na- 
tional student score Finally, to 
illustrate the rapidity with which 
standardization score distnbution^ 
change, data could be collected the 
first year after normrng and then an 
average of sUte and distnct scores 
could M calculated 

This task would be very difficult 
to do correctly Different states and 
distncts use different t'*:*" ^hat are 
not on a common scale The scores 
from all states and dist icts would 
hive to be coll^Mrted. f'laced on a 
common Kale, and analyzed appro- 
pnately There is no evidence that 
this has ever been done correctly 

This brute-force approach need 
not be the only mechanism to deter 
mine achievement trends over time, 
nor IS It the beat way Achievement 
changes between normings are 
documented by the major pub- 
lishers, and this information could 
be directly examined 

\ third approach intended to 
monitor national achievement 
trends might be NAEP But NA£P 
IS also an imperfect panacea for 
determining achievement growth 
There will always be quahty-control 
issues, as evidenced by questions 
about recent NAEP survey results 
NAEP IS a valuable indicator of 
achievement trends, but like any 
method it is not absolutely perfect 

The fact is that vanoua sources of 
information must be synthesjced so 
that a complete picture of naMonal 
trends can be obtained Eac type 
of assessment, via NRTs, CRTs, 
NAEP. or others, attempts to 
answer different questions in dif 
ferentways Each is valuable in pro- 
viding a piece of the picture on the 
status of student learning It ts 
when we learn how to make artful 
syntheses that all of us will be closer 
to determining the status of 
achievement in Amenca's schools 

It IS unfortunate that dunng a 
tine when national achievement 
trends are moving upward some 



might use t tat fact to suf^st that 
one of the easons for the upward 
movement is inadequate norming 
by test pu'>lishers and inappropnate 
teachir^ of test content by users for 
self serving purposes These are 
senous charges that should nov be 
made without supporting evidence 
It must be stated that there is no 
logical reason why test publishers 
would wish to engage in inadequate 
norming Test publishers nave 
every incentive to make sure that 
their tests are completely objective 
and are administered properly and 
that their integrity as valid 
measures of performance stands 
unimpeached Without such quality 
test publishers would quickl> find 
themselves with no customers 

CoBclaaions 

To be sure some of the concerns 
raised by Dr Canneil are shared by 
all in the educational communit) 
The time-bound nature of norms 
may not be well understood by soiT^e 
school personnel and the public 
There may be abuses of tests and 
breaches of secunty Some teachers 
and administrators may indeed 
disclose too much tvst content to 
the students But the overwhelming 
majonty of the educational com 
m unity is doing i»3 very best to 
administer tests and report test 
scores in a responsible fashion 

At least two examples of this 
come prominently to mifwi The first 
IS the way in which test publishers 
equate alternate forms within the 
same test battery over time Thus 
CAT Forms C and D ( 1977) were 
equated to CAT E and F (1985) 
Similarly equating ts done between 
different test battenes developed 
by the same test publisher as was 
the case for CTBS Forms I" and V 
(1981) and CAT E and F (1985) 
Thes« equatings allow the test user 
to move from one version of a test 
to another and preserve longitu 
dinal compansons The recent trend 
that has been observed m these 
equatings is that the denved scores 
from the most recently normed test 
are lower than for the earlier 
norme 1 test This is predictable m 
times of increasing national per 
formance The opposite would \\e 
true if national achievement trends 
were on the dec rease E x ilanators 
matenal that helps the practitioner 
understand th-s phenomenon is 
lOnal Meaiuremfnt issues ind Practice 



ERIC 



'34-661 



90 



30 



always provided 

The second example relates to the 
Annual National Normative Trend 
DaU (NTD) publiihcd by CTB/ 
McGraw-Hill Research on this proj- 
ect began in 1984, when an emerg- 
ing customer need was identified by 
the company Cuatomer comments 
about the desirabihty of obtaining 
more recent normative data were 
noted in market research efforts 
Such data could be used to amplify 
the standardization norms and pro- 
vide a more complete r icture on the 
progress local school districts were 
making m their instructional ef 
forts After 3 years of research the 
NTD service was offered to CTB 
customers Score reports have been 
made available on an annual basis, 
for the standardixation year as well 
as for the moat recent norming 
This sennce is a response to those 
educators who have been concerned 
about the time-bound nature of 
norm referenced scores 

The test companies do their best 
through many vehicles, to assist the 
test consumer m being a responsi- 
ble user of test results Indeed, 
reasonable testing prograns. effec- 
tively implementea. are one of the 
reasons that achievement is increas 
ing and that we are not currently in 
the decline phase that manifested 



Itself m the late 1960s to the 
mid-1970s 

The assertion that scores are on 
the increase does have mem 
Perhaps the positive side of this 
phenomenon should be stressed 
more States and local scho<>l 
districts have committed con- 
sideraUd resources to improving 
the achievement levels of their 
students All indicators of student 
achievement appear to con /erge on 
this fact, particularly for the 
elementary grades The Amencan 
public should be gratified that 
achievement is increasing 

Cannell (1987) charges that "mac 
curate initial norms and teaching 
the test." rather than improved 
achievement are reasons for im- 
proving scores on nationally 
normed tests The problem with 
these allegations is that there is lit 
tie. if any, evidence to support 
them To the contrary, the body of 
in'iependent evidence suggests that 
test norms provide a valid and 
useful reference m both the nortr- 
ing >ear and m subsequent years 
and that achievement at the 
elementary level has been increas- 
ing If indeed there exist <nstances 
of abuse of test norms and of 
misunderstanding of their meaning 
by educators or the public in 



general then the proper remed> 
should be to correct thos* instanceb 
rather than to make rash aiiejja 
tions ah4)ut the adequacv of test 
norms or questionable teaching b> 
educators 



References 

C*nneli J J (1987» .Safi<Mw//y ♦u>rtw<i 

flrmmtary at-hxrvrynfru f^nfirj m 

4 mfnrn $ public sckooU H<nt all nrfy 

states arf above thf nat Tnalat'erage 

Danjeis VT^ Fnends for Educilion 
Congressional Budget <)ffice (lM8t>| 

Trmdt \n edutational ichu\'rment 

Washington DC Aathor 
Corigressionai Budget {>ffice (19s7( 

Educatwfud Ach\er-emrnt Erpiana 

txons and imjdieatxont nt rrcmt 

trends Washington DC Authc 
(TB/McOraw Hill (19771 Calxt'.rnia 

Kckxevrrnxnt Te»t Forma C .md D 

Monterey CA Author 
CTB/McGraw Hill (l»8n TawprrAm 

nw Tmt B<me Shlla Fimru I ind 

V Monterey CA Author 
CTB McGraw I'll! (19851 Oiluor^ia 

Achtei'ement T*9t Forrv E Jid F 

Monterey CA Author 
CTB/McGraw Hill {mi) 4 miu.u' n,i 

'lonoi jiormaMty tr^rui Uita Mon 

lere> CA Author 
CTB McGraw Hdl (iy8^*i Amba/ n<i 

txonai ihrmui/tiv tr^nd lata MoP 

terey I A Author 



SRA Response to Cannell's Article 

Audrey L. Qiialls-Payne 
Sclev^e Research Associates 



The author defends SRA's 
norms, discusses some of 
the difficulties In pursu- 
ing Or Cannell's pro- 
posals, and points out 
that we need to monitor 
not Just student achieve- 
ment levels but also 
trends In curriculum. 

Summer 1988 



Saence Research Associates (SRA) 
recognues the concerns expressed 
m John Cannell's arDde, "NationaUy 
Normed Achievement Testing in 
Amenca's Public Schools How All 
50 States Are Above the National 
Average " We differ however in 
our assessment of the situation and 
the proposed alternatives Accord- 
ing to the article, most schools in 
the nation perform at or above aver 
age on commercuJly available tests 
1^ finding, as noted b> Dr Cannell 
is not Consistent with statistical 



theory which says that haif the stu 
dents shouki be above and haif below 
Dr Cannell expresses the opinion 
that this inconsistent staa<<t]cal phe- 
nomenon results from using older 
tests, older norms teaching to the 
test sUtistical manipulation of the 
data by publishers excluding spwial 

Audny L Quaiie P-ii^nf Fiuc^.'mt 
fnndn Snrvt /f^«'i' h 4 >■« -i^^ 

mm! f^^f'ifl 



ERIC 



31 



echjotMo itudnta from tht CikMbr 
two of group avrnfM^ 
cunti DoriM. Ht |o« <» to "V' 
gmx tl Xhtm probttina c»a b« 
ehnuiwtMi., tht uh of one acfaievt> 
ment ttct ui all tcfaoola acroM th« 
oountTy with tfat ooncumnt devttof^' 
rient of annual norms. Our purpoae 
19 to examine Dr Cannelj's condu- 
siona and offer altcrmtivM tosome 
of the laauM railed in hia report 

SRA'i natKMud norma are rdiabie 
and accurmte indicators of nabonftl 
student performance at the tune of 
standardixation. The charge of lU* 
tisticai manipulation of data ap- 
pear! to result from Dr Cannc^'s 
apparent misundervtanding of the 
purpoee of the various types of test 
scores and subgroup norma. Schoots 
nwy wiih to compare their students' 
performance witJi. ui addition to 
that of the national group, that of 
groups mof similar ui structure 
snd stu lent composition. For exam- 
ple, a nonpublic school nuy want to 
compare their studenu' perfor 
mance with that of students from 
othn* nonpubbc schools. TV various 
test scores, in addition to status 
xrwf ft e , peronCQss and stanmesK 
are offered to meet the many needa 
of our customer* Normal curb's 
dqu]vilenu (NCEs) are required for 
Chapter I program evaluation To 
assess longitudinal growth snd 
detemuw ^mctaonal ievelA, devckf)- 
mental scores, for exair.ple. stan- 
dard scores and grade equivalepu. 



are needed. 

Dr Cannell's alternative to the 
vanous standardued achievement 
tests IS a natMnal achievement test, 
which would require at ieast two 
nuuor actions First, this national 
Achievement test would have to be 
Harmed annually with a rcpreaenu- 
tive group of students to have yearly 
norms. Second, new test forms 
wouUl be needed for each sdmims- 
traaon to elimmate possible prob- 
lems of teaching to the test and test 
secunty 

A project of this magnitude and 
complexity would be very difficult 
logistically and very costly Two 
major lofpstK problems would be 
(a) obtaining cumcular consensus 
on the test contmt and (b) obtain- 
ing or mandating national p«r 
bapation 

If yearly new forms are not an op- 
don but annual norming is, and tf 
there truly is i substantial amount 
of teachii^ to ^ test, the problems 
noted in Dr Cannell's analysis may 
not go away If new forms of 
achievement tests are developed 
each year, thereby increasing test 
secunty, the need for annual norms 
diminishes significantly Based on 
Dr Cannell's analyse from schools 
with tight test secunty and liter 
ature on student growtn, drastic 
shi^ in student performance from 
one year to the next are rare From 
a psychometnc point of view, new 
norms sre needed only when there 



IS a t\9H}fifcant shift in school cur 
nouhim snd/or studnt performance 
As opposed to developing and 
standanhzmg new fonns each year, 
a mechanism is needed to monitor 
changes in school cumculum and 
student performance Whenever 
there is a change in either oir- 
nculuffl emphasis or achievement 
levels, new test forms should be 
developed and standardized If the 
change is stnctiy a shift in student 
achievement, renonning is required 
As s publisher, we must base our 
decision on when to issue new 
forms/new norms on a systemaQc 
monitonng system 

There are several ways to moni- 
tor student progreu One way to 
accurately spot when significant 
changes are taking place is to track 
student achievement on a regular 
basu (1 e , annually) The entire user 
group could be used for this pur 
pose The monitonr^ process shoukl 
be capable of produang user based 
norms, which can then be made 
available to all customers as an op- 
tional service in addition to the na* 
bonal norms 

There is at least one mapr prob- 
lem with the user based monitonng 
system If the user sample is biased 
Atid unrepresentative of the nabonal 
student population, significant 
changes noted in the uacr sample 
may not truly reflect changes at the 
national level One way to resolve 
this problem would be to select a 

subset of schools from the user 

ifroup snd use it to monitor changes 
in cumculum snd student achieve- 
ment annually The selected schools 
should be representativ- of the na- 
tional population ot schools with 
respect to gcographK regK>n and 
racial/ethnic and socioeconomic 
sutus Once a set of schools is 
selected for this purpose, students 
in these schools can be tested on an 
annual basM and norma can be devel- 
oped As in the pre/ious method, 
annual norms will be made available 
to customers as an optional service 
Because of the represenuuveness 
of the schools selected for monitor 
ing, one can, with a high degree of 
confidence, generalise results from 
this set of schools to the U S pop- 
ulsuon of schools 

Because SRA reoognned the value 
of s monitonng system, we are 
all ^y in the developmental sta^ 
of implementing such a prvigram 

Educ»Uon*i Meaiuremeni Imu«s wid Practice 



ERIC 



32 



Chairman Hawkins. Well, thank you, Doctor. 

The next witness is Dr. Walter Faithorn, Jr., retired business ex- 
ecutive and volunteer teacher at the University of the District of 
Columbia. 

Mr. Faithorn. Yes, sir. Thank you, on behalf of Friends for Edu- 
cation which is the organization that Tm speaking for today. Tni 
not a doctor, Congressman Hawkins. Tm hopelessly outclassed in 
terms of professional lingo here today. 

Fm from the Chicago manufacturing sector and my experience 
with the problem of the education of our children most recently in 
my career comes from the difficulty we ve had in hiring people for 
our factories in Chicago who can read and write, and I mean 
people who have high school diplomas. 

We went through many years of endeavoring to recruit factory 
emplovees in Chicago from those sections of town that had ex- 
tremely high unemployment and poor economic conditions. We 
found lots of people who were willing to work and, as I se'd a 
moment ago, had high school diplomas— many of them— but ths^y 
couldn t read the buses and the el-trains and the subways in orde^ 
to get from where they live in south side of Chicago up to the near 
north side. 

We started classes at my company to teach people how to read. 
But I didn t expect that was going to lead me to this room today. 
I ve been retired for a few years, and Tm a volunteer teacher of 
English at the University of the District of Columbia. 

I was asked by John Cannell, who is the President of Friends for 
Education, if I would substitute for him today. Tm sorry that he 
isn t here. I m sorry for your sakes, as well as for my own. So I pre- 
pared a written statement which I .submitt^^d in many copies yes- 
terday, and I was told, by the way, by one of your staff that I 
shouldn t read that today, I should be much more informal. 

So let me be^n by just quickly reviewing how the Friends for 
Education organization got started. John Cannell, who^s just a 
youngster m my opinion— he's in his middle forties— and a kid I 

^ ^^ise when he was just a youngster, is a physician— an 
M.D.— and was practicing in West Virginia. He opened a clinic up 
in the mountain woods territory for people who had never been 
able to get to doctors before. He lived in Beckly, West Virginia 
where his kids were going to public school. 

He obser-.ed that they didn't seem to know from shynola what 
jyas going on in the world and they weren't learning anything. 
They didn t know whether France was the capital of Paris or Paris 
was the capital of France and all sorts of horror stories like that 
that Im sure youve heard and read about from lots of other 
sources. 

But at any rate, he went to see the school principal to talk about 
It. The principal and his staff were stunned that Dr. Cannell was 
unhappy because, as they pointed out to him, their school tested 
way above the National average. He was upset by this. The impli- 
cations being that the National average was so terrible that he 
started writing school boards in Kentucky and West Virginia and 
then expanded his effort to include all 50 states over a period of 
several months, and found that virtually every school was report- 
ing their .schools and the students being above the National aver- 



ERLC 



33 



age. So he decided it was really a big fraud and that's what started 
his organization. 

As I pointed out in what I submitted to you, he brought all this 
to the attention of the Department of Education and talked with 
the then Secretary Bennett who was quite skeptical of his findings 
at that time and authorized an investigation within the Depart- 
ment as to whether or not Cannell's claims were true, and found 
that they were. And lots of other sources have corroborated what 
Friends for Education discovered about these so-called national 
averages. 

Cannell went on to pursue the matter and decided that not only 
were these norm-referenced tests misleading in terms of where stu- 
dents and schools stood, but that there was actually a sort of a con- 
spiracy between the publishers and the school authorities, particu- 
larly superintendents, to continue these kmds of tests which made 
everybody so happy— the parents and everyone — because it made 
the schools look so good. Then the kids who were all testing above 
the National average when they came to take SATs or the Army 
tests, just were doing miserably and tho^e was understandably a lot 
of confusion that resulted from that. 

Well, all of these are well-known facts, I am sure, particularly 
among the professionals who are here today, but I thought it worth 
mentioning because that's what has made our organization as rank 
amateurs so upset with the present scene. 

We believe very much in the points that were made by Dr. 
Haney. Nothing that he said about the need for much better test- 
ing—all of that is what Friends for Education stand for and are en- 
deavoring to help expedite. 

We also are very impressed with the work that's being done by 
the National Assessment of Educational Progress, which is within 
tne Department of Education but has an outside governing board 
which seems to be giving it excellent impetus toward progressing 
this question of more intelligent testing. 

We feel that the publishers who Dr. Faldet represents are ex- 
tremely good businessmen. They've developed, as typical of good 
businessmen— and I can speak from personal experience on this 
point— a cozy relationship with their customers who are the school 
boards and superintendents, and they've promoted their product 
extremely effectively to the point where there are some 50 million 
tests a year of thk sort given and the burden on children, as Dr. 
Haney has pointed out, is simply enormous. We'd like to see tVit 
greatly reduced. 

Really, I don't have anything much more to say that will be of 
much value to your committee, but Til be glad to answer what 
questions I can. 

[The prepared statement of Walter E. Faithorn, Jr., follows:] 



ERIC 



34 



Friends for Education, Inc. 
600 Girard« N.B. 
Albuquarqua, NM 87106 
(505) 260-1745 



5 June 1990 



Waltar E. Falthom, Jr. 
3800 Undarvood Street 
Chevy Chase, ND 20815 



Augustus F. HavKins, Chairman (California) 
Coaaittee on Education and Labor 
U.S. House of Representatives 

Washington, DC 20515 

My dear Congressaan Havlcins, 

Thank you for inviting us to testify at the hearing on 
** Test ing/Assessaent/ Evaluation, etc." of the Subcoaaittee on 
Eleaentary, Secondary and Vocational Education on 7 June 1990. 

The % Aeva of Frienda for Education on the subject of testing 
are, I believe, pretty veil known. That vaa ay inference, at any 
rate, froa the coaaenta of your adainiatrative aaaiatant. 
Dr. Dandridge, in a brief telephone exchange ve had about 
arrangeaenta for thia aeeting. I will auaaarize our position as 
follova: 

We believe that veil -conceived, properly deaigned and 
aecurely adainia tared teata of atudenta at a few crucial levels 
during their eleaentary and upper school yeara ia abaolutely 
eaaential in order for all to Icnow %rtiat our children are learning, 
whether it ia aa auch aa ve believe they ahould learn, and vhat 
achools are doing a auperior, an average, or a poor job of 
teaching. We believe thia Kind of testing ia neceaaary to the 
developaent a<id iaproveaent of curriculua — that good testing 
drives good curriculua. 

We are aorely diaenchanted vith vhat ia going on by way of 
teating, today, particularly the prolific uae of coaaercially 
prepared and diatributed "atandardized achieveaent teata.** 

In 1988 ve reported to educatora, generally, and to William 
Bennett, then the Secretary of the U.S. Department of Education, 
in particular that the reaulta of theae teata vere routinely 
coapared to the acorea of a "nora group" previoualy tested by the 
coaaercial publiahera. The "nora aroup" had taken the test cold. 
Current acorea of atudenta nov practiced in the aaae teat are, 
quite naturally, higher, ao Johnnie coaea hoae vith a coaputer 
printout that telle his parenta that he ia teating *• above the 
national average," that by iaplication hia achool, his teachers. 





35 



his principal, th« district superintsndsnt, mwn thm school board 
itself, ara all "abova tha national avaraga.- Evarybody is vary 
happy. And avan battar, avarybody in avary othar school district 
in all fifty statas is squally happy bacausa avary kid in avary 
school is tasting "abova tha national avaraga." Wa contacted 
Bora than 3,500 school districts in all fifty states and what 
they reported to us said, in affect, that 70% of our school 
children were testing above the publisher's "national nora" on 
commercial norm-referenced achievement tests. 

Secretary Bennett called this the "Lake Woebegone" effect; 
he waa at first skeptical of our report; he authorised an $80,000 
study by his department to check out oui story; his study 
confirmed our findings. 

Because it became immediately apparent that tha widespread 
use of tJieae commercial, norm-referenced, achievement testa so 
profoundly affect the feelings of the public at large toward 
their schools and all the personnel connected with them, these 
tests became known as "high stakes" tests. When scores are high, 
everyone's job is secure; if scores are low, heads might roll. 
It also did not take a high order of perception on our part to 
realize that "high stakea" testing requires high security in 
handling them. Many and all manner of unsolicited letters began 
to arrive, mostly from teachers, about how they were required to 
"teach the test," to cheat by giving out answers before the 
tests, by changing answers after the test, by keeping predictably 
poor pupils from taking the test (invariably children from 
minorities and/or other disadvantaged groups), all such 
manipulations to better insure better results to be better than 
the "national average." 

We investigated these allegations, at least as "any as we 
could, thanks to a small grant from the Kettering Foundation, and 
to our satisfaction found them generally true — if anything, 
understated — just the proverbial tip of the iceberg. We think 
all kinds of cheating is going on in respect to these tests and 
wa think the big, commercial publishers of these tests know it 
and at best look the other wty. I do net use the term, 
"bureaucracy," pejoratively. Our whole society is a bureaucracy 
— big business, government, professions, nonprofit 
organizations, public education; none of it could function 
without bureaucracy. And the public school bureaucracy quite 
naturally takes great comfort m being able to report "above 
national average" test results. 

The principal victims of this scam are, of course, the 
children, and the most vulnerable of these victims are minority 
Chi Jren and those otherwise disadvantaged by poverty or other 
calamity. And when Johnnie does really poorly on college 
entrance and SAT exams and on Armed Services Vocational Aptitude 
Battery scores — when the emperor comes by without any clothes 
on — we all look at each other in puzzlement, and parents wring 
their hands in confusion. 



2 



36 



W« think v« should all !>• wringing our hands — but »or« 
because of iuinsnt peril than confusion. Our kids compare 
miserably with their opposite nuabers in the world's other 
leading industrial nations. Just to naae a couple, how far 
behind our two former mortal enemies is all this going to leave 
us in a few short years? 

The U.S. Department of Education will not carry its 
investigation of our work beyond confirming, as it did, our "Lake 
Woebegone" expos4 of performance "above the national average.** 
In an effort to better prepare myself for this hearing, I met 
with a group of senior officials in the office of the Assistant 
Secretary for Educational Research and Improvement, and I was 
told that our allegations of cheating, fraud, and deceit were of 
an "anecdotal" nature and did not lend themselves to rigorous and 
objective verification. These gentlemen went on to say — and to 
me this was the more important part of their answer — that not 
the Congress, nor the States, nor the local school boards (all 
15,000 plus of them) want the U.S. Department of Education 
messing around in matters of this sort — telling them what they 
are doing wrong, how this State compares to that State, or this 
school district compares to that, etc. They didn't talk about 
money, but I suspect that it is also quite acceptable to the 
White House Chief of Staff, among others, that the Department of 
Education not be making plans or noises about things that cost 
serious money, we really wonder if President Bush believes that 
the federal contribution of 6C for every dollar spent on public 
education is enough. 

Nonetheless, we are impressed by the work being done by the 
National Assessment of Educational Progress (NAEP) and its 
governing board, the National Assessment Governing Board (NAGS) 
towards setting achievement levels defining what students ought 
to know at different grades. Because it seems to us that our 
kids are operating under an unreasonably onerous load of testing 
in their schools, today, we applaud the NAGB approach of limiting 
the testing they propose to the fourth, eighth and twelfth 
grades. We very much approve of the way in which they are going 
about developing a broad national consensus for defining 
standards to be used in accomplishing improvement. We are 
concerned, however, about a possible reduction in the rigor with 
which test security will be practiced. 

As many of you may know. Friends for Education is a small 
group of rank amateurs founded and energised by a young 
physician, John J. Cannell, in general practice in the back woods 
of west Virginia where he didn't think his kids were learning 
very such in the public school. When ^e talked to the principal 
about his concern, he was told that he should t>u happy, just as 
all the other parents were, because his school te^ited "way above 
the national average." That's how it all stared. A tm\ parents 
joined him, and because he could not afford the time or money 
come to this hearing, I was pressed into service (as an old, old 



3 



4ii 

ERIC 



37 



f rl*nd who b«lp«l hla out a bit wh«n b« wa« a youngstMT) . H« l« 
nov living in N«v Itoxico, im on th« faculty of that Oniv*r«i y*« 
Mdical »chool, and i« al«o carrying on dasanding poat-gradujta 
atudiaa . 

I mm avan a rankar a^taur, a ratirad buainaaa axacutiva 
froa tha manufacturing ••ctor in Chicago, now a voluntaar taachar 
at tha Univarsity of th« Diatrict of Coluia>ia, aid a voluntaar 
rapraaantativa, today, for Friand* for Education. (Par»it aa 
plaaaa, to nota, paranthatically, that tha aaount of work I hava 
LAd to do in ordar to do any juatica at all to thia opportunity 
to aaat and spaalc to thia Coas.ttaa haa fillad aa with wond«r at 
how Congraaa avar gata anybody to taatify at haaring* who ia not 
wall aupportad by ataff , facilitiaa, aonay and highly paid 
profaaaional apolcssaan. 

Z wiah ay claaaaata, Francia Kappal, wara still on tha 
acana. (Ona ahould not infar froa that that wa wara cloaa 
frianda^ actually wa wara aara acquaintancaa, but I adairad hia 
graatly.) I faa?. cartain that thara would ba a auch aora 
aggraasiva attituda in tha Dapartaant of Education today. 
Outraga and fury ovar tha ridiculoualy haavy load of •ndi««« 
taating, and so auch of it dacaitful at that, would driva hla, I 
thinJt, to tha doors of Congraaa for aonay and authority to do at 
laaat as auch to protact childran froa fraud aa a Dapartaant or 
Agricultura aaat inapactor doaa in hia fiald. 

I think ha would invita tha Congraaa 'a attantion to tha 
racant T.V. adition cf CBS's Mtnuf a whict docuaantad 
widaapraad chaating by achool authoritlaa in c juth Carolina and 
hriw tha blaaa and puniahaant waa put on taachara. Francis alght 
wall aay to you, "If you won't lat ay dapartaant do anything 
about thia, if you *fon«t lat aa protact tha firat lina of 
victias, our childrsn, and tha aacond lina, our taachara, than 
you do it. It won't ba tha firat tiaa Congraaa haa baan goadad 
into action by a talaviaion prograa. Whan you want to, Congraaa, 
you can do aoaa pratty haavy-duty invsstigating. " 

Thank you vary auch. 

^ > Raapactfully, / • > 
Waltar K. FaiUiorn, Jr. 



38 



Newsweek 



education i imnic in CKarloCtw- 
"llf \» a commitu* of govwr 
iMfi 4nd Vhiu Houw tuft 
Tmmt»n h« b»»n m«Mm| m 
Wi«hmftoo to dffinf t w nation * •(iui.* 
'ioul(oAit THv pttRvl h</pn to out lU 
j»! month jum in tin» nr PrMident 
Bu«h 1 1 inciodr it m hi»Sut»o( the I mon 
«idrm to Lon<{T«i That > the part 
r ^-^.wlf uvon rpdvKinj th« dropout -ate 
jnd pnv-ounmnf itxai or»«tivit\ But 'or 
oner r<rr^ww aiao rfctsfnun that piati 
tud I won ( b» ad«)uaU to th« catk Thr 
hard part •ill cvrw iatar • h«r» t hp ^-omrai i 
«*e J pipKtcd to prupoar a national vard 
»tKk .hat wi!i rnaolr \iTwntan* 'o nn.m 
how ihrir «(ud*nu and achtwis arr *ar mi 
TS» l>i< i^untton »'\t v^ommxttM u)- 
(.hairman I arroll Campbell th# (o>rrro 
o> SouthCaroiina ia how do vou mmaurr 
•uccN»— a«ainat wlut taat * 

F 'r educator* part 'vular!^ thoa#w ,- 
frm 'hatwthf (^jMtionforthp li 
- * furw ^v> »njii in>n> 'ht' j n».pn 
it\ >i«<d V.U t»«tin< jn> I hud md pr^1Uil^ u 
> • >t..i n.'w «Urula a«.Tuaf>d o! Kroaa -ai ^ri* 
-It «ta i>prnd)nf on the cniu tne nt 
tiont \auntMl aundardued pxam* a 
•nraaurf ih» «ronc «li tia b' diaton viaaa- 
rtMn natruaion <. lalaeU rraaaurr par 
rnu d dut.riminaU a(afut 'hv and«r 
pri\iircrd Th* push ^or better tf«(a..omM 
*mm 'w\eral aourm EWtfd jfftciajj are 
demandirik more fmm the education mtr 
Jar n» and need a *a> fo "wld them a^ 
vOUBUiOie Parenu. from Miami .o I htoa 
<o j'e hein< .nvited to thare 'n thr 
manajjenjeni of neighborhood tchouU 
the* "oo are -nwjthnf to n\\. on the 
t\rann\ of anecdote or true faiae <)uiae« 
Knd emplo en *at thev no longer «ant 
people «ho ha>T» mastered 'uat the ba«t..-s— 
jithou^h ioine dan thp> wi»h -hev >Mi 
•wore Jl then— but "htv need peoplr who 
>.an ihina In »hort the v-onautner mo\e- 
•nert -aa anally entered the «c'woihov'*e 
Ne»« >4iper^ ran* »hoola and (^'atnrta fci 
the r'evt vcirea r«a.-eaute agrnu oae te«t 
vorM -d aent H ifr be«t *i h lol'. a* 

rt» \>r rxpe"«.|>p hv>u»<-.|< ^►n,. 
•uperintfndpnta.Ar ^red 'or vor»-. 
and>jvhen an >e -nerit pia\ ' jr Ti^r 
vi.re* •sH»iornp>(-pp«rd ar i>d-v-»ii.--r 
>jrjtrs*r it he L r'.^pr»in ( ( .i.NraJv 
F^r^n«l »»a:e« < \e TjrnlJi >r; 
4ih,«ement e<t» Nir-ip ke Np» \ v 
ird J, tj'T J Jpvpiop their v)*n , 
(Hhe'^iaae 'ie nMour 'p^dinji ariTierLi i 
»rand» In additum -nanv locj JstrvN 
arQp>pn ndn idual «i houi» t^^u > a-d 
arJii«Mfui'Ti>ijn ^eii-i>»n \nil!iplevierj 
< -venment 9 \it ona, \aae90men >! 



We need to produce students who know how 
to think. And we need new tests to help us. 



catKmai Protrwi'N^EP ^r^iarU teaua 
tmail random samplfol jrih p^^hthand 
IJlh-grada ttudenta nationwide \ recent 
stud\ bvPairTeet a Botton baaed ad» ocac» 
lifoup '^nd that I S pvibiK ichop.» ao 
Tiniatpred 1"^ ni,i on itandardiied e«t» 
lo .i^j* mi..,on sljdenu m the 
Khool veer j.one— an avpra^of more t hai 
SaiandardiJfd laau per itudett per \ear 
But for ajl the manv choicea. «avi Ernett 
Bover pr«i»dent of TheCamegie Founda 
tion for the ^vanctment of Teaching 
Moat of our ^-urrent efforu at iiirMmi i 
hav^beenwoefui v inadequatp 'rigment 
edandpeendeetru^t.ve 



•raderru ^p- 

* ^Jifxp d,t s LjUt V\,*ev' n T«t ? f 
mhen . n » ^pdjlp-li r - 'r-n» a'-p- "\e\ 
rratp J "p* Kl t or 3 a»LJ \ r^e'\ 
«p\pn\p«n' In'hp-^par it.p <(..dpri»4rf 
vomparvd » "itonii ^taidjrd ha. ^ 
vime'ime^ t ,»rf jr -j * j do. wp i^t • 

^ ^p -je ' .1.1 iorr-ii*i inf 1 i str v' 
^^iv*t ^ tut dvi -fc > 'pr * 'I, u 1, 

heating on a ie»t Tparher Nann \ parjrn 
.untidered ^> . leagur* lo Oe onp o» 'ip 
3e»i 'eachpri .n the kKooi Admitted 'ia' 
mp had dutributed *hp queationi and an 
«wpn t wu da>i ttetore a lundardiaed eiar 
to two low ability ifaofrapht ^.laaaaa She 
• aa hred and proteiutad under a "youih 
I aroiina law that lukei breaching teat 
set urit* avnme \parfina\otded*pend 
^<'iia*».n,ai b\ pleading fui t\ a*t Sep- 



tember and paying a WOC &n« Clever ad 
minutraion neadn t fo thi>- 'zr inacaad 
•omeitmplv ancouragt liowar pup^a to ba 
abwnt on the dav an exam 

\ team of education profaaaon fron 
LCLA and the L'nuwit* of Colorado -e- 
^•entlv completed a more tctenufic version 
ui Cannel. a rimx that comparad student 
•e« arorea from venaua autea The* found 
Cannell a naent^h correct But the* 
don-haveafimpieBOluiwn Tha problem 
u that too much i« being ev*ct*d of theae 
e«ta. laviibibertLinn aprolaaaorofedu 
•tiop at ihe Lnixpmtv ot Colorado 
Thp growing diaaatutactton with moat 
>un<urdi3e<3 leau haa led to a aearch for 
aitamati>efl For iaffe-a c aie anaMfnent 
•ome eiperu favor eapana-.-ig t he faderail) 
^unded NAEP Ov»r .u JO-vaar hiatarv 
N AEP haa aantad an enviable rep>iiation 
among •du<.ator« It « a tatt that combinee 
HL-ripip >. hoice eaaa* and problera-aolv ng 
-^uett..^n» it a |(i\en on a random Mau to 
onl> a few kida ta any claM- 
"^m ta*chenL 'hen cant 
teach the taai " And lU norma 

are developad to carafuUy that 

thev re truatad Hewwar unul 

raowtlv federal Uw prohibitad 

the uaa of \'A£P acorai to 

cwrpare autaa or diaincta 

Congrew &cuily parmitlftj to 

••perunerui Raia^vauu 

companaon of •coraa fw the 

etghth^frade math i«ac in 1^*90 

So tar r autaa have agraad to 

participate 
The NAEP board would tike 

to expand the coraparuona 

a«.roai the board but aorae edu 

.aton -ear that will ineviubix 

ruin the taat If NAEP Korea 

become M important to educaton and poii 
naaera "hat claaaroom .natrurtion la ui 

.ored to It ita .alur aa an indicator of 

achip* p me nt will be threatened saviOan 

p. koreti an anal vet at the Rand Corpora 
^1 N AEP buard chairman t heater E 

J- rr hinkJ the nature of the teat ^'xedure 
« pretcnt that from happening Vkp\e 
iotavtrvgood jutfimentandwe ve<tfiin 
'•"j-nLoiapp*.. tp'j- n'orira* on * 



42 




39 



Br M«tr4 B. Fbk« 

C«rri».« KtUlor. tkc (oik hu* 
moriii has provoktd many a 
smilt by bti dtscripUoa of ibt 
mytlucal ton o( Ukt Wobtfoa 
ta a pUc« wtort ita wqrmo it« 
stroni. tte DMD good-kookiai aad 

all tte ciuMTM art akovt avtr- 

But looluiig clowly at tte rt> 
sulu oi tkt aitimatod M nullloa 
standardise achievtmtnt ttfU 
taken by AnMncan tcbooicMldrto 
•very year it stems tkat svcb 
fantasies are no lon|er a laufiunf 
matter 

For several yvan vutiuUy ev- 
ery state cducaiioa itputmm - 
and eves tkt bmm wtauMd local 
vrhoo) distncts - have w l aaaad 
sUndardtsM] t«M teotm siMWinc 
that their childrta are readmf, 
vntiai and calniUUM akove tte 
national avera|t Since tkjs by 
definition uDpoaiUn, \m mak- 
en and edncaton have hM> ac- 
cused ol fitytng suuttlcal or edt- 
atioaal shell f ajoet 

Last wftft QMattr E. Turn Jr 
rhe aMutut U S. lacTttary of ed- 
uratioo ID charge of research 
called both std« lato bM ofHce to 
esplore tbe laiM H« conclvdM 
that tbe standtrdlaai \m teotm 
used to evalaau pvblk: scboob 
were BOt •Imkjt what Ibey ap- 
peared u be 'Mayto rt'a tune lor 
the DepartmMt of Ednratlea to 
do sofiMthiai by way of provMlBf 
some AfarmatM* to cm — m st l >" 
he said afterward. 

What Finn described as the 
Lake Wobtfon effect" «aa fim 
raised ifi a syMnatic «ay by Ik 
John J CaaoaU. a (ainUy byii- 
cian m Baaver W Va . who waa 
coecenwd abMl the pro bl a nw of 
low seU-eMan and i is |i ittw be 
saw 10 raaay of bia tMMfe pa- 
tienu I noticed a diacrepawry 
between ttetr acadanic parfoim- 
aoce and the grade )e««l to vhich 
ihey were aBit«Bod. heiaid 

Whefi Caaaell baaN a repor* 
one day (rvn hia lUtt'a Bdiica- 
tiofi DepaitflMBt that ac h eol ch ild' 
ren >B Watt VlffUua whtch has 
one of the hlfh«t ilUtancy rataa 
ID the nation were perfonstng 
above the natioul nvertfe. hia 



waa below the national avemfle. ' 
he said 

Participants la Inat week s 
meetiof. called by Plan to eiptore 
CanaeTl a charges, aaid what 
stnick them Che moot waa that 
none of the two donas pMple pre- 
lant - not even the t«t mailers 
- took isane with hia ft 



' There's ao dlsp«te (hat teat 
Korea are natac." said David C 
Delfley general maaafer of CTB- 
McGrawHiIl pwhiiaher of the Cal 
iforma Achicvemcat Teat and oth- 
er icau The dispete comes about 
why 



* DefUutMoe of "average' are 
ovt of date The major t«la are 
flnt given to a sdaatlAc aampte 
of stadenu *r««ad the coaatry 
Their icorei bacooe the beach 
ntarha, or nonna, for detanraaiag 
whether those who follow are 
scwad above, at or balew the na- 
tional average Bat norma for 
many of ttaao taala have not been 
reaat far au or seven years. Tee- 
ters say that schooto have been 
getting better aad the average has 
been naiag Coaseqaently many 



be scored 'balow average' are 
itiU "above average ' compared 
with the early lllOa sample 
groapa 

A Sdnola pick t>«U that match 
their carricalnms This maana 
that their stadaata, nnlike maay vt 
thoae in the aamp^s. wUl Had a 
doae fit between the qacitioiia 
aad what they have been uofht 

There s an npward bias said 
Deffley 'By vutae of the match 
yon re likely to have higher 
scoraa ' 

A Owre laachan become famil- 
iar with the taata. they tend to al 
ter their teachiag to anticipate 
what their atadcnu will 



He fonnad a aoa-preAt ortaai- 
latiOQ, PiiaadB far KdacatMn lac . 
and caavaaad sUle e*acatiea de- 
partments aroand the ceaatry 
We cov'd not ftod aay MU that 



A There are ae iadaatiy staad- 
ards oa what ttadeota uke Um 
teat. For Uh tnal teats, jiaav dia- 
tricta g}ve Um eiana to all sta- 
dcoA tachidiaf thoae with laaia- 
ing praMeaaa Bat fnaay of the 
diatneia Htaf the taat eaclade the 
scarea of sach atadeau "I find 
that I 



improved in the ItMa Although 
pnmary school scoraa have naea. 
high school scores have not The 
Edacational Records Bureau 
which spccuUses in teating stu- 
deau IB private schools and is 
wealthy auburbaa dlstricu re- 
ports that scoraa at all grade lev- 
els have been 'stable aince 
1I7V-M ' 

There u aiao debate owr how 
far schools go te alifn thev car- 
nculums with the tetU Caooell 
charges that many schoola are 
giving their studente actual teat 
Items Teat makcn achaowledge 
that some schools uaa the same 
form of the tenia every year bat 
they argue that the oatrlgbt 
alleged by Caanclt a 
read 

When all is said and done 
everyone saaraa te agree that the 
staadaidiiad taatiag ui ^ 
try u stractared so tkat eicept tn 
rare penoda whaa atadaai kaara- 
tag ts OB the decUae. it ta unpoaat 
ble te have a teat srhere haU of 
the stadeau will be reported to be 
above average aad haU below 

-nw tasting ladaatry waaU to 
•eU lou ol tota, and the achoot 
snperwtendMiU daaperately need 
high aad improving scores, said 

Cannetl 'Nobody is 
disappoiated ' 

latheream or ethicai taaaa 
herc^ TtM fact that edacaton >nd 
test makers are nakiog pablic 
lUtemfots that they know are 
misleading te parenta and taxpay 
ers woald faggaat that there ta 
Casnell. however says that the la- 



laya ta 

sue goes beyond old-lasbiooed 
trvlh lO one of Jaatice 

The appaaraace of high scorea 

allows srhooi distncta te coatiaae 
tunung ovt luaetioaaliy lUltarate 
children, uidCaaoell 

FliHi, who ackaowtadgaa he did 
not reatiae the fuU extant of the 
tcetiag paradoa. sutfasted that it 
woald be a 'fine idea te hold any 
future meetings la the Chattarboi 
Cafe 

run the place la Lake Wobe- 

San that scrvea ap Powdermilk 
iscutts wholesome enough to 
'give sh7 persooa the strength to 
get up and do what needs te be 



C/55 



Some etplaaatlaaa are ihem- 
sclvn dlspated beglaning with 
the aaaum^AWMi that schoola have 



FiJttr untrs for Thr Stu. 
York T\m«f 



ERIC 



40 



Hie g^ogton (Btobe 

MONDAY, NOVEMBER 20. 1S«9 

Critics press for alternatives to standardized tests 



B« Mund Cohtn 

i ompuunu iboiR T^* abuM of 
»tud«fnt 'tsMnf proctdur«a ind 
•^xCT* 4n prompcmf educator* iM 

lucv lOMUlir aivl to >U*Q Che JM )f 
<>-t -^^UiLt 4» w*JtMtet of achool 

TM'TtlAtid pnhkmm haw b» 



eonfution that i unonal deinn(- 
Iwiut Co find btOff wty« Co t«t arri 
repon r«tulu is bane aatifaliaMd 4i 

Jif I'mvtracy ot North Carolina. 
b«rt Shinkcr rrMidant of tha 
Xmtncu r««itfition of Tearhan. 

rvcvfltlv caJi«d ' ' ^ tmmcdiat* itoD 
'o starMlardut<1 -r>u ac 'he ^'icinrn 
And >«con(lm '(chool c^tri^ 
An incptaaa in calia lor atroLtnt 
have W 'J) tJ» uae of >«* 'o 
jrffp th* efTecmff*^ of cwhm 
<»n.1 idirinutncon Scom or >.wai« 
JthwF^fnwfit «n<1 ruboriAl t**tji 
urn >c»>oJtoC)c Xptitudf 
^» jatfrt M * Tw.j4*urf of hiifh * n . . 

Ft cumptr w wndi'wn !,.,- 
J» rpn*waJ of Bc«u>n ^houu Sup*, 
itfndem UvaJ Wdion , vxntract j 
eMdenci? 'hat t#fc vom Sav n-*r 
jn<l#r ->» •♦nurp EUlon it jiurd 
rt'a that wm "Tonnud" m x-aj^ 
^ -cronbnf lo Muv EUct D. i* 
iu» uf -h* *rtt*ffi » '^rg oflW 

•riKv s:tir» ^ouid S» ' ir i ,r 

Tc- rg ecmpanm do 'o. r« jh 
o1 a "nrm ?>fn S*rjust •'\*f\ 
-a^ t »vuld t>f too fipensisf "if 

t-r* Mn lUrt — ^ •e,t-» ^n,, 



Moraa ui raadrnf. wnong and math 
hat bd Co a good daal of chaaonC 
among Mhool ccachera, pnncpala 
tfi tupcnmandetiu. aenrdinf U) 
Dr John Jacob Cann^ a nwrtent 
p*>ehiatnsc ac cha I niver«it\ .if 

The -Ml pubiiahm ar« b«tpn* 
ninf to naiiond Co tduattan com- 
piAints McGraw Hill 'which pro- 
duces tha -vKiaO uard CaiifoiTua Ba- 
it; <lalis ufta. 4 nihtcnvif i.p wcih- 
nt\ and tnnnf find a ^horl'ut for 
oiafcliahuic norma » ihart wiL be a 
new avrrift each yaar 

"l^(^JBtrY Mandarrl for normmf 
J levrn Co eifht >ean.' Mid Jonn 
stpwart. of Sfcljnw HiiT* teaune 
J- 1 ~MH«d 'n MontfiTv faiif 

^Nc TOW pnnc on aiJ wit rr- 
poTU the date of the normuig m that 
•MTTM^tf cant hide hehimi 'M ilate* 
Vtt io kmfer put »nawer >if>* n 
•eat zmckMgn lo ^chooi*. but **m1 
"wm to -he ta'jw coordinator to 
ipip i\ Old cheating ' >tr»irt ^1 

arf lix>Kmit nto deveicpmr 
J le* test on u\ ann'ial ^^awj No» 
we ..an iinortue the coat <ner ar 
e»<nt vear penorf twt f we ha\e :o 
produce one annkul!> it >kiit *eM the 
proe jp" 

It *a» Canntll *•» fjund that 
Twat v^hooi lystetna in the (^jntiA 
(.'aim Jiai their ^udenta perfwi-m 
atiirse national nci-Wif, * lioh m»L 
>>naL>cail\ >mpoMuble 

■"ScTiTMon iucal'K'nm I'^ft'itr^'r'i 

iirun*ucaii\ tKat fai \ 

liiwr* nation*] lorr " »- )U' t_jn 
1 n "u.* f>e**«t N»n 'Her* Nhiu 
iiiiucaton Cheat on Manr j-«dize»1 

U h ement Tens " 

\ presnotift 'i "vp»n'' -Vxj 
Ti., j»i K<iu 70 perwnt 'I VneiV " 
.•«'T*-tar> L-hudrer 'W p^rre' 
\r*nc*n nchoo! ilwtrtct* a* «i 
state* w#re testing above V putv 
'i.«S«r8 'nalionaJ 'Kwm iMleaH '1 
J>f expected V) perw-it." 



Cannall laal thar* » a fancril 
.ack of taeunty m tha Marnf and 
diacnbuoon of taata. Taaehan and 
prvNspala have (.me to wumkm the 
taiiu and cuach thav scudmu with 
Nimiar land» of qwaoona on «Nrpa< 
rtbie bme Khaduiaa." he uid 

hattk » (jettinf accoumabil- 
ity n taaonK." Canncit sax) tn a cakh 
phona intarvtaw 'Taachcra will 
chaat to fK aeoTM up whan tha 
<takct arc htcti Supanntandcnta 
and achool admiraatraion arv tha 
main heneflcianaa of taMrhcr eheat- 
inf and man> taachere dont think 
'Jiev an chcaonf" 

"Maine doaa tt nfht. Cahforma 
dbca a creditable job and Maaaachu- 
■setu docan t dn too hadi> when com- 
pared to othen " Cannell iaid. 

In .1 »ut^bv-atata anal)'«a. he 
lound that "the Maine Educaoonal 
Aaae^amem la reportad with sealed 
scorea and la no kmfcr equated wnh 
laoonalty normed taau. Taachen 
ma\ nuc obtain tha teat bookicta un- 
it] the <iwT baforv BMiiif, the hook 
eu arr shi-^nk wr^ipe«i Then Jctnr- 
••'vd nr iie a ehaab and leatuiff » 
-nunttorrff by state ofRciak " 

Maine ethieauon comfmaaioncr 
Eve Bither *ad that conwnitteea of 
Mauv> taachen dcvekipad the taat 
u\er A twTKvcar pervl The aaaaaa- 
mert ta giroi CD ail fowth. eighth 
and eSaretith frade Kudanta 

^We t^t \crv emt^ with tha 
'csLn 'n a daaa '>f 12 in fourth irade 
^ence for nalanee it i« poaaibie 
•^•aii each child wtU luve a diffemit 
(•t >f iiuaatMna," B.ther uid 

Snnet roncurtd viUh Canned on 
he vandardued itnt 

"When the Main.r E^hjcaOorul 
Awwmeni came out for the flrrt 
rime »t found that ihe latjonall) 
"landardued Cait painted a far rtv 
-ler picture thai e.enbom Sere 
*a.» jtwe a»erac» than vihat the 
MF\ iiwnd." vud Bilher 

Haiiw Han«* of ine Borton Cui 
leffp Tot renter said that > annel' 
^ ilone a r^tul »« wp bmppnf 
parert^ att^iLon U) ^est rc»atu 




44 



41 



THE ARIZONA REPUBUC 

Educators aid cheating, report says 

Seek job security by helping students on standardized tests 



Nvk Yoill ( AF> - EauGSton d 
int armtwt cfwai on . 

teu^ OOMfllMUII I( 



Tm 



M'S 




•vcimii." • rapon dwrnw 



, all manu nmum nottUy 




KcntriMf FMMiy fmrnomm i 



Dr Caiwitf i 



vtmDo- 1 W wxAi fipon dROMmjoc 
ifiai HudMi wm wan^ "atiove 
•vertflr ' oi MHiiiiMi Mn ■ ail 90 

lUWMttalllfllt 

That rwfon aaRMd liai moth on 
sucJt "Bpr i iMKai w r t«u — 
u^tnad 10 itMi oniv half Uhm tttani u 
soouU aoora aftovt M aoui paivanuic 
- wci« traficiBll V hiifi larfriy bacauar 
ihei wnmwwtiwftBBtMpdatad oiMn 

Thi rawlMit avfrty bn^ pKiurt of 
aMm aijMaiwiiwM team laMad 
uw "Lak* WaMfoa Effact." afur au- 
itor OarmoB KaHiofa mydacal Maw 
iou lewB wlwrf "aU UM cMdraa w« 
abov«a««n|C ** 



Man ai in- 

oira'lifi^OMMUd^ 

tact, mam itei aciioaia art fKxmg 





t Tf«, lilt r- 

A£llHIMMll T*:. UN SCMHt lU- 

aanana Tw af iaw Sialic aai Uh 
lawiTasaflaaeSkilla. 



%ymti§t m tta Wa« T« «( laac 
SkiUi. cntf n pmaK «( Kmnickft 



■ «n tht Corapiiaaaavt T« o.' 
itaMc Skiua, "aai^fl dit tan hm' 
Oaorpa and KaanicKv aav« anoag m» 




I IWa «• van af dw anva tar laiaicr 
aaaiaa. TMdMW. pnDca>aii 
il aJaMMUaiaw SviTbund 



law that aaaa> c auaK waeSnoit cnaa 
Uw piSuuSlBnaru 



iM ainac i 
SwiwMa 



Umr i ctoM, "5gnri riirm to ofc 



now t» wmOm or broan and adwoi 
dittncu can ot wbiact to aait tau- 
ovar partly on UH ainafUi «( aiaoaafO- 



Aooai^wtatripaR 

> ct\, «auB lofM tM aaiMiif 
tnion to rootiva Uw laM aartitr lAar 
(lit Oiy (It* laai ■ 10 M hMir 



Or CaaatN wi Fniiy thai n add< 
Doa 10 ifta MB dau at |»iiMrad tran all 
Mnaiaa. h r piaced an nd m tfte irmtw 
i cunial "EducatMB Woc k mviiin^ 



thiA 3 



admitjnK that Ltty ur c 
u. fipttaa wNA 100(1 or hthNd'suMni: 
univoparty All daoMaaoo ancnvmirs 

Or C 



— Oniv « dooon ism 
(«H booucu 00 Maiad 

Drafts ol Uw rtftor wart lawwg 
ihaaMunrrOvaoootn (oomif auuwr 
ut%, cluM pavciuainau aaa aaioaiarv 
nOudint CoaMcr Finn. Jr tonnr- 
aiaiauAt Oducaiion tocTr.«ry anc 
now • pratooaor of oducauoA ai vanotr 
bitt UnivetW\ 

Finn called we rcpon "a cooiuiic 
tnc and uacfUipMCtot wort * 

"ir Cannell u ncni and lia irac 
tvcord u audi Utai nc probably UL uoia 
are aoUfc anduoppv w ornaoiuip loa 
iKuni\ tlui It ' uke letiLii fcjoun 
maruiurvMMOrqtuiiiv in Prmce WtUiani 



One Fennessee teacher wrote that teachers in 
Ids schooi '*spent the mcrning teaching the 
test and the afternoon giving it" 



ERLC 



42 



THE WALL STREET JOURNAL. 



Classroom Scandal 
Cheaters in Schools 
May Not Be Students, 
But Their Teachers 

Pressureto Bolster Test Scores 
Drove Beloved Instructor. 
Nancv \ eargin. to Cnme 

U She a Martyr or Villain' 



By (Un Pmu 

ORKKNVIL; 8 S.C - CiUir>ti fticf 
could runUy bBik *h«reyn Wltik clvUir 
U)e ComprpbeosiTr t - n Bute Skilb to 
msU) padrrs GrmvUir HUD Sdinl 
lajt March ^ ipottcd a studn* ioo^ 
iriK at cnb siMMs 

She had sm chf Atinc brfoiT but ihrv 
(irtfs wrrp uncanny a stockbrokpr is an 
rumplf o< a pmffstton m trndt .tini i 
itfurr Al the pfld of Worid War !l 
o«r>7um/ mnmbffvd beforr Japan 
Ttt S<natP-Hr-ts« cnnffrmcp commiitff is 
yawJ wfJM a Jlli ts pnsvd &v rftf Houie 
<nd :)mati m thfkrnH Airms 

virtuailv wonj for wtMtl. the notn 
married qu^stxxu and answers oo thf so 

\>f tJ)f !«t the stu 
Jfnt «as taiing In 
♦act student had 
't\f anmn to al 
(mm aH of the 40 ' 
{uestjors In (hat ' 
sKMon TtSf studew , 
sur-vnderwl 
''•tfs but not with- , 
>iit I protest Mv j 
Vi. hiT mk} It was i 
I'h fur rwti'use the . 
T'tJ-s (in 'he 'e^ 
tiP 'viid 

^ftf 'fictwr in 
('.♦•^tiim WIS Nancv Yeartin - consid^rwi 
tA rr uiv stMdents .tnd parnils tu be jne of 
')«• best it 'he sffux)! i^mnted Mrj 
>tMrtin Adn.itTed st\e lud flvpn ihe ^ues 
•pxi^ ind answprs r*»> day^ befinr th** « 
vninition tu two low ab«l!t> ueofraphv 
t'L^Aes She tud jgime «> far a, [o dispiav 
'!)»• 4Uf suoos (HI oji (ncrtipad prrjjertrir ind 
' .'d^rUne tUt* ^aswrrs 

Mrs Yeirrn was firrt and pmswutHl 
unrtiT aj) uauKiai ^th rariitna i^vi that 
H'aJtes i( a ' nnp tr br^arh mt s^cuntv 
frt s^teirher sJw pipidrd |tMiIt> ^ntl paid 
1 5 ><»u fine Hf r aitpnuDve was » dayi m 




THI RSDAV NOVEMBER t 

Her rory la partly ow al penooal 
downfilL Stae wu aa MMMta| teteher 
wte w kuR^mi iMIInd MiriMts. bat 
Ac wM prataHy nem tCKft anta. la ber 
vue ifet Ml 4w bmnm tot H«er « I 
priirtpat vto «u ber me«l aatf nlh 
i|er t betnyvr. of collMgMi wto iiy *e 
hnmiM tbcmabam of Murinii apd par 
fits «bo dcteaM ber aadiHW lie ns 
Med bwiWy; tot of KbHHMrtct 
iWt KMNd dai doitte the bUttkced oa- 
nmofbfracOp M» became iDinettli« 
Ota loeaj naartyr 

PoariUe MMtvad« 
; Mtx Tearfla s cMe alK can snme 
Urtit m ibe darii Mt of vhool r^fonn. 
»bere pmnrei oa (aacben ai» irowtni 



fiaaced tbr tfemptattanto chetL Tte isr 
salute tfn. Teaf|ta vlnlaM «m *^ 

■fMd (D nlorce provMoM of Soitdl Can^ 
Ite's Khool^bnpnmnMM lawi. Praaeni- 
wrv alkfttf tbai Ihe waa trytiiff A boiMer 
studenis Koni to wto a boaw under the 
state $ ISM IdncmdoB Impn>«enMt Act 
TUa boM depended « ber autty to pro- 
duce Uftcr audem-Mt acona. 

Tiara k InowdtMe pmure « aclnol 
sysem and teadien to rate UM scorn, 
nys Walt Haoey an educatton professor 
and ceMnc jpeciatw at Bocton CoUer 
"So pffortt to beai the teats are abo on the 
itie A«d iTWft duturbaif. it ks educaiors. 
Hrt siudenta. wtio ire Warned for much of 
wTTfltdotnr 

A ^O-sUte study released m September 
tjyFnendster Education an Albuquerquf . 
N.y school mearch poup concluded 
ikai outnfht cheatuif by American edu 
<.U6n 1$ rommon Tbt froup says 
lUadardiieu achw^pment test jrorw arr 
prtify tn/lated b*^:aiBe teachers tiflfn 
neich the test If Mn. Y«uiin did 
thoujrfe maai ar* nev#r caufftt 

tTideiMre of vtoeapread cheatinc has 
mrfacfd >n vmm] states m the last yrar 
w K Calironuas Pducatkn department 
Mfpects adiUt rvsponslbilirv for i*rastjm 
M « schools that chanred wrmf answers 
;o-r1f1it ones on a sutew^de test Aftpr lu 
tiip'THJs ix-currences of qtiesuonabip 
tpii her help to students Tfxas is '▼vlstnc 
B seruntv practres 

•mvascf Notice 

And valrs i>f tpst-coachmr bonklws 'or 
lassmom instructton jit hoofniru Tb^ 
matenais tncludtnf MacmiUan Mci^raw 
HUl Srtjoot Publtsl^inf Cc s Sconnf Hlfft 
jnd Leaminf Materials -are nolAJnjf short 
il sophistic ited cnb sheets accordint \o 
srw recent jcademk research Bv isinif 
'Wm tfacherS' witli administratup t)i^s& 
uj« - tPiefTsph to students beforehand 
precise ams oc which a t^t wiU runcen 
trite ind sumet'ties pve 2«av a frm 
ut qu«^Kms and answers see retatrd 
tide .Ml pafe AU t. se rjf Sronnr Hlfti 
wKlPspread in South ( anjilna ind '>m 
n»)n In Grwmli* I ount) Mrs Ynrpn\ 
•*rfk»(l dtstrvt 



txPerts say there m t imUKr state ut 
the country where testy mean as much aa 
they do In South Carolina 
. Onder the sute s Edttcatkn Improve- 
mm Act low test scores caa bloch atu 
dents promoOoRa or force entire districts 
tmtnrreiichlnf state-superviaed interreo- 
tMMtt that can mean flrtngs Hlfh test 



scores a the (Xhrr hand brtnf recognl- 
im and extra nfwrev-a new compuier lab 
for J school r»nts for special projecti, a 
bonus lor the supertntendent 

Critics say South Carolina is payint a 
once b> stresainf unproved test scores to 
rrgch Friends of Educattoci rn.* South 
Carolina one of the wont seven stale* la 
Its study on academic cheaUnc Says the 
orfinuatKxi s lounder. John CaaaeU, pna 
ecutinc Mrs Yfar|laia a way for adniin- 
isiraiors to protect thenuelvei and look 
like they take cheailaf serloualy. wbeo |» 
fact they don t takf It senooaly at all ' 

Paul Sandifer. director of tesCli« for the 
South Carolina depaftmeni of edMcatkn. 
sav^ Mr CanneU s aUeiatkns of cheatUtf 
if* purely without foundation and 
t)as«*d un tififal- inferences Partly because 
^'f *ornes about potenaai abuse howeyer 
hp iivs th^ state wiiJ befln keepinf closer 
•nek A jch)efpmenl test preparatJon 
t>K^klets next spnnK 
^uaents Perspfcdve 

At ''.rmiviljp Hirti Mrhniil neanwhilp 
^oiiip students fSprciilK m the ( brer 
ei4line!>*}uAd-*PiP niifh?d Ilshardin 
-'(p.din In 1 vear uld whv atniwHW 'hev 
like had to savs Mrs >»jrd S»m 
T •Hhiis tppeired ir 'he nrndorsthat , ir 
n^i 'he yfiM'is taiMlur red ind-whiie 
r,}{S, .(HT) .n the fn^rt i^n 'he hark 
"■Mns read V>e \A\f ill the answers 

Mans nlledjrues are intr> it Mrs 
^ifirnr She did i 'lI )f tunii siV3 
1 jthPkT Pju' »hi.i hid disc (nered the nb 
"iites We wf rk dAiv.n hird it whit we 'lo 
'-T Jiinn jf'e irxl tth tt she <1id \^ 
jAiir lipersinns >c i'\ >f gs 

Hut s*'\''ri 'euhers i)s<> viv iht* met 
If isis jiKjht ' 1 'he *'v|(irr ■( e^ nuil 
iri; t'li If I''* Is ':tand.T"l 

1,„ng jktii' she Ikl 




43 



THE WALL SffiEElMBML 

^=-=«= ^ THl R-SDAY NOVTMBER i »9W9 D^-.r..«Mr. 



Tests Often Match 

Matends tn Kits 

And Study Booklets 
i • • • 

! How 1 California Cxim Has 
Same Question Inckided 
I In Commcraal Workbook | 



By Cast Pimu 

Sfff Krtmnrt •/ Tw Wm4. sranr JovmxAi. 

Since chtik first toudbed sute. Khoo) 
chMrrti havf wited to know W)ut s on 
Ui* t«f ThMt days, students cin often 
find the answer tn tnt-coachlnf workbooks 
and workatefti ttoetr teachm gtve tliem It 
the wevki pftor to taUof standirdlnd 
achiemnent mta 

file mattaematlcs wctlaB of tbe wviety 
used CalUoreto Achievement Test as^ 
nrth paden What is aoothrr name for 
the Roman numeral IX" It aJao aHu 
them to add (wmvcnihs and thrte-wv 
enihs 

Worksheets In a tMt practice kit called 
LeanUni Materials, ioid to acbools acnm 
the country by MacmlUao/McOnwHUl 
School PubUahlBf Oo cootaln the same 
questkm. In many other Imtaacet, thetv 
Is jimort no (iiffereoce between the real 
test ud Ueanlnt Matertaii. What s more 
the test and LetnUac Material* are both 
produced by the same company. Maorul 
Un McGrawHUl a louu venture o* 
McGraw HUl Inc and Macmlllan s pwwit, 
Brtaln s Maxwell Commufttoatlon Cwp 
Ptnilf b to Testa 

Ooat parallels between lefts and prat 
bee tests are common, some educators and 
rweartber^ say Test prepartlton boo* 
lets software and worksheets are a boom 
inr publlshmt sublnduJtry But some pr»c 
ijce producu ire so similar to the tesu 
themselvps that critics say they rrpresent 
i form o< school sponsored cheallni 



It I toot itnese preparatioii bookletsi 
into my classroom. Id hare a hard time 
iustifytnf to my students ud parents that 
it wasn t chetiliif. says John Kaminski. a 
Tnverae aty. Mich . teacher who has 
studied test coachlof He and other critics 
«y such ctadilnf aids can defeat the pe** 
poae o( staadudum teata. which la to 
nufr tearalar pncraa. 

It's u tf Prance decMed to ftve ooly 
French Watory questkms to sttidenti »n * 
Europeaa hWory claa. and when erery 
bodyacesthet«at,tbijrsaytlielrUd$are 
|Dod tn Buropeaa hlMory, says Jotu Can- 
nell an h an^ rrpe, NJL. payctalatrtK 
and founder o( an educational research or 
pnlatkm. Pnendi (or Bducatloa. whkh 
has studied staadardlaed itMlnK 
SiaodanUaed acMevemeoi tests are 
iXxjiA 10 mUUoo times a year acron 
the country to students genenlly fnxn Un 
derimrten throucbetihth r»de Tbe most 
widely used of these tesU are MacmUian^ 
McGraw s CAT and Compffhenslw Tfsa ot 
Basic Skilb. the Iowa Test o( Basic SkUH, 
by Houghton Mlfflu Co and harcourt 
Brace jovaaovich Inc s Metro|ioWaB 
Achievement Test and Stanford Ach "ve- 
ment Test 

Sales ttiures o( the test prep mate. *al5 
area t known but their reach into scha.'» 
IS sicnificant tn Artiona, CaUlonjia, Flor 
idi. Uuisuna. MaryUuid New Jersey 
South Carotin* and Texas, educators say 
they are common classroom tools 
Briak Sales 

Macmlllan McGraw says well over lO 
miUloo ot Its Scorlnr Mfh test prepan 
Uon bnota have b**n sold slr.e iheir intro- 
duction 10 ye»rs ago with most sales in 
the 'ast five y*»« ' Iwut 20,000 sets o( 
lieaminfK'atenals teachers birtdershave 
also bee- sold in the past four years TV 
matenals in each set reach about « stu 
denu Sconnr High and Leamlnf Maten 
als are the bestsellint preparation tests 

Michael Kean director o( marketinf for 
CTB Macmlllan McOraw theMacmiUan 
McC.raw dlrtswn that publishes Leaminf 
Matenals says II isn ( auned u improvinc 
t«t scores He also asserteu that exact 
questiona werent n-plicaied When re 
feired to the questions that matched he 
iaid It was cwncldenul 



Mr K-iminski the schootteacher and 
v^itliam Mehffrtk a Mlc,*uf«» Sute Ini 
vtTsily education professor concluded In a 
stud\ last June that CAT test versions of 
Scoring High and Learning Materials 
shouldn t be used \d the claasroom because 
of their similarity to the actual test TVy 
devised a €i-polnt scale -awaidlnc one 
pouit for each sobskUl measured oo tbe 
CAT test - to rate the closeness o( test pre- 
paratives to the nrth grade CAT 

Because many ot these subaklBl-the 
symmetry of geometiKal Igures. metric 
measurement ot volume, or pie and bar 
graphs for exampte-are only a ■nail part 
o( the total ruth grade furrlcuhun Mr Ka 
minski says the preptraikm kits woulte i 
replicate too many. H tbeir real Intent was 
general instruction or even general famU 
lartiatlon with test procedures But Iram 
tng Materials matched or «-5 o( •» subs 
kills Sconng Hif matched on M S 

Fifth-Grade Exam 

in CAT sections wvre students knowl 
fdge ot two letter consonant sounds a 
tested the authors noted that Sconng High 
I once ntrated on the same sounds that the 
t»>st does-lo ihe exclusion of other sounds 
that fifth gra lers should know 

.ing Materials for the fifth grade 
contains at least a doten eiiamples of exact 
matches or close parallels to test items. 

RKk Browneil senior editor of Scorttt 
High says that Messrs Kamtnskl and 
Mehrens sre Ignoring tte need students 
have for becoming famlUar with trsts and 
testing format He sak) authors at Scotng 
High scrupulously avotd replicating ex 
act questions but he doesn t deny that 
seme Items are similar 

When Sconng High first came out In 
1979 It was . publication of Random 
House McGraw .till was outraged In a 
l»5 advisory to •sj'-arors McGraw41lll 
said Scoring High shouWn oe useo 
iduse jt represented a parallel form of 
the CAT and CTBS lests But ui IM 
MrOiniw Hill purchased the Random 
House unit that publishes Scoring High, 
which i^trr became part of Macmlllan 
MfCraw Messrs Browneil and Kean say 
they are unaware of any efforts by 
MclirawHltl to tiwdlfy or discontinue 
Sconng High 




44 



ivMountain Nd 



Vincent Carroll 



The great testing lie 



A prfdicllon: whtn 
iradf-ichool sludcnU uki 
• chlcvfinfnl itsu this 
sprlnf. moit of uiclr Kom 
will rtu from • year ago 

Now iht tad ntws. ThoM 
scores win bt vtrtuaiiy oiftn- 
Inf less 

From C04SI lo coast, school 
districts conspire In The 
Great TesUng IM Thty per- 
mit the lamt lUndardized 
tcsU to b« admlntaured year 
after year whtltier dellbar- 
stcly or not. uachers aM»ar- 
ently adjust curriculum! lo 
emphftslse test material 
There Is no other way lo ex- 
plain why so many dlstricU 
consistently report rlslnf 
scores 

Achievement tests for 
younger itudenU especially 
lenerau laufhably implausi- 
ble results year after year, 
and no one stems lo mind 

well, almost no one A 
doctor in west vlrfinla cared 
so much ht spent 111,000 of 
his own money to compile 
and asseu achievement test 
resulU from every lUU and 
hundreds of disuictt. 3o far, 
no one dispuUs John Jacob 
Cannell's remarkable conclu* 
slons. not even the tebtlnf 
companies themselves 
Amonfhis flndlnfs 

■ About 90 percent of U 3 
school dislrlcU claim to be 
ibove iverace in student 
tchitvement, and most an- 
nounce year-to-year in- 
provemeni. 

■ Every Southern itate 
tcstf above tfie national aver- 
a(e except Mliilislppl, and 
even tt \t at Ute mythical na- 
tional "norm " 

■ More than 70 percent of 
Anierlriin chtldrm are tnM 



they ust above the national 
average 

■ Twenty-six lUtes teat 
on a tuuwlde buU. and all 
nport above-averafe scores. 
Six oUiert. which have devel- 
oped tneir own usu and five 
them statewide, test above 
average, too 

The problem tsnt merely 
that the seme usU are re- 
used Most elementary itu- 
dents score better than 
average even on brand-new 
achievement tests, raising 
doubts about the accuracy of 
the norms uiemsetves 

Georgia, for example, 
should have everything going 
against It In itic tesung derby 
- large numbers of oisad- 
vantaged children, high 
dropout rates, low college- 
entrance scores among 
high-ichoot scnicrs - and 
yet Georgia's second -graders 
scored abote itie OOiti pcrcen- 
Ule nallonally In every cate- 
gory uit prsi yiar a revised 
Iowa Test appeared 

Not every district reporU 
above-average scores, of 
course Achievement Is so 
bad In tome big cities mat 
even artificially low norms 
only partly dUgulse the fact 

But give those districts 
Ume If they lUck with the 
same tests for the next de- 
cade or so, they toe may 
eventually a' tounce that 
their itudents have bolted 
Into the ranks of high 
achievers 

See how easy It is to Un- 

prove America's sc hools? 

V/ncent Ctrrolt Is deputy 
tduoTl%i f^ge editor of The 
Rocky htounuin News in 



45 



Sfte JJeUr JJork ®ime$ 



THE WEEK IN REVIEW 



Sunday, October 23, 1988 



IISnCREWESniND 



8y Alb«rt Shanktr Pwdtnt 
American FflO«ration of Ttach«rt 



•he new ie»i ,s cjMer \tuden( \M're» <*iil iike'\ go up-(«peMjij\ 
rrportthe m M>iT\pjrni>n to ific »coie» on ihe nid haroi 

ICM I ind Jon I twthcr to it.ii >k»u jbuui the (.h^ngel Or if the n<r ^ 
mi |cii you the same or wor^e rc«ut!i divricti miN \Uim ihjt t^i 
nr«* test wai mukh harder Ih^n the old one or v> different thai 'h<. 
kidt needed lime to adjuit to it ShoulJn t we hive • wiy of kno«in. 
when I new leM i« u%ed jti«l whkh ie)l in toU|her or eiwer irxl in reu 
lion to whal^ 

F-iiMheimore Ihe letl worei ]U>l tton I tell verv much If nou i 
lolj ihal "0 perk.ent of Ihe t^idi in * \\rhi^jl v>r di«iricl <re ^hove istxigi 
what can N->u e\pck.t ihem to he jhle lo do' \^rile • decent Iciier 
Li>derMand »n ediioriir Figure out the week^H cou of «hoppin| !ron 
the lupermark • idi in the new>piper* Thr fad n that you Mni le 
from th« wi> leil »core» jre reported whit tludenli rin at cinnoi J< 

One to remed> thii would he lo s,reile t Naiionii Bureju 
EduMlM .it \*eighli ind Mea>ure> \^h.*t would «-uch *n jgenxN Jn> 
ll «ould 

Nould >^ wid jfirr %hek.kin 

• put>lith a ^rilKil d1re^lor^ of te><< ihjil would detente 
'^■luite the mitur tiren|lhi and weakneite* o' *is,h jvulible te«l 

• do Ihe nec?<Mr\ re%ejrk.h lo thil ^^.ore^ different re*M vj'i 
etjuaieJ with each other lo illvw pei<ple lo t 'w whether the ' r 

and me«\urM ire not poundi of p^Tcmlile on the Io*i Teil n ihe «me n the -JX.n on ihe Vinf.'rd K 
iheN re timdirdizcd teili •nd i«t leore* ^nj ,he m„, on the MiOriw Hiti TeM 

• find heller wa^^ of a)^e^Mng knowledge and vkilli ihjr> th 
urrentK iniJe<4uile multiple >.hoik.e piper jtid penMl teili 



Getting a Fair Measure of Learning 

Testing Ne«^s Regulation 

We live in an rra of dere|ulition It t unpopular lo catl for re|ulalK>n 
and loternnsenl intervention but I m |oing to do it invway 
It wii not very long iio ihil people were rouiineU cheaird of 
iheir hard earned income wheneser thev went lo the market Scalet 
were often li^ed lo give ton weight Jnd Loniaincri didii I ■lwa>i con- 
tain what lhe> sliimed to MoU Oi ui now shop with confidence because 
there are go»rrnment regulation' and mtpectc'i The problem hain t 
diuppcared-v^me icain or ta^i tielen are «ltll riggc«l-but thoM whn 
ifN lo gtve u\ short meaiure ri\k heing caught and puniihed 

\^e in education jlw ne«d government regulation of weight* and 
miaturet c\ccpl ih<l ouf 
tpmat(>e» or tj\i tare meter' 
\ihat < the prohtem ' 

\ {{ood c^anple votnet from a curses done *>v Df John Canned 
of Friend* for Edue^tion idiuu>-<rd m ihi« volumri on Dec ^ \*)^'' 
and April ^4 1<*KK and hs Daniel korelz in the summer IV88 ii«ue s>f • develop ind enforce Mandardi 'or the ,„f of stanJardi/ed 'i 

The Amrn.ju i /<iciH«f > li »ht>weii that according to ihe reported tf>^ludlng ieiHet.uril\ updating ol n^^rm^ »eporling of fe*ulH cu 



Ihe maioritv of 
L»ke Wobegone- jnd hi 



reiutu on the moil widels uted standardized 
%tudenU in jll >.tates were tike all the >.hildren 
aho^e aNerage ' 

How vould ihi\ impoiMbilils happen' The aniwer i that the 
average in standardized le\l« i« not what mi»l people think 
Mur \hild ^hot>l or dnirm i« reported lo he above averi 
would rravi>nahh think that it me'n« plating m the top half of thou 
noH taking the test ^oi «o VVhal it realN meani ^s that 
top half compared u, a telecled ump.e group thai took the ten at 
much ai S to 10 Ne*r« igo' Shouldnt parenit ^nd ihe public ha^e an 
ets\ wa% of knowing ihil^ 

Dr Cannetl 



gsvsernmenl fund* to do more rigorPu»l» what Dr I jnni 
group did-iurve* all MHc\ and •»Haiiiie» to «e how ihc\ 
telling and 'eportinf the relulI^ 

proNidi* jMivljnte to \ijie\ Jnd loLjIiliei that want to iimpnt 



and their prtxcdurt ^ 
n ,>i.>am^ fi>r eiample h^ |>jt 
tdu-aiumji Pfogres^ i N At P 



regularize md improve their icMm« 
ir repi'-rlmg to the puhl t and mcjstir 
tKipjiing in the National Viieismeni > 

• in^eMigate v.>inplainiN jK>ui 'r\n j v\ icMing prjMive^ <n. 
c (he auihoriti to rectif* ^^^l^e^ 

• handle tomplatiiti from pjimii siudenii or lejvher* jiv'u 
alw Nht.>*ed annual improtemenit in leit unfair or jmhigiiou* or fjuliN leM ilem\ 

Ncorei in Mate* where v hinii people had assc^i io ihe ie4l items ,n , ^elp tejshiri vtudenti pjrcritN the prcii hu*inei\ and 

advance hut not initj'.. where there «a» no %uchacceii Lleail> lull L'f general nt-htu wiih inlormjiion 'hai »ill cnjhlc them m undetMjn, 
kid% arr e\tetniveh picppcd f ' the leili while other* lake ihem ^old ,,4, resulii and jsk miell.geni queMiv-nv ah»ioi ihe ie\t* being u»fd i'u 
Shouldnt we know which daiei diMncti or ichouli ha«e aice%i 10 tetr results repi r'->j 

item\* Should wr re<ulaie su^h it^eu' \ie prepped uudenn who Hore j^^^^ ,hmgi ihji need 10 h* d< ne so that *c get h,.nf. 

>*rll better than those whti are not prepped and store pooil> hetiu,e measures No otu s J(.mK ii now Sure wr generans don I 'ikc ^os 
their tcjthtrs mn he ^osenng other r vhrr malerial' Wedont know regtibiiont and w.nve j. ihe ihoughl of a po^yhtt ic* 

teau(.racs Bui in spiif of ihn I don t see ans,>ne ifsiriK !o jhcu^l 
itaic k^^jj ir rijtn''nj| j^cnti's ihji t:»ni lo make s>ifc ihal hr'r 
^> nsumers don l >el cheated The puhtit. i»antv to feel >.afe tri<m r 
>.ij rs tr jnv nihtr >tcMi.t: 10 vhcri chjngf 'he i. ivton>rt Its tir^e u 
jilor the sJfnt ph uophs m idu>.alK'n 




46 



Chairman Hawkins. Thank you. I think you've presented your 
views very well. We hav*^ had an opportunity to read your full 
statement and I think you've done a great job in alerting us lo cer- 
tainly some of the specific things that need to be corrected. So we'll 
look forward to asking you one or two questions. 

The final witness is Mr. Ramsay Selden, Director of the State 
Education Assessment Center representing the Council of Chief 
State School Officers. 

Mr. Selden. Thank you, Mr. Chairman. I appreciate the opportu- 
nity to speak to you on this important topic. 

The Council of Chief State School Officers is a private, non-profit 
educational organization representing the 50 states and other 
extra-state jr.nsdictions on matters having to do with education. 

The Assessment Center, which I direct, is committed or aimed at 
enhancing the information we have for evaluating the quality and 
the dimension of education in this country and for executing the 
states' responsibilities to contribute better information. 

We feel that the need for assessment information is critical. We 
have to have a systematic basis in this country for knowing how 
we're doing. I don't think it's any secret but the Council has been 
committed to providing valid reliable state-by-state information on 
education since 1984. We have been at the front line of efforts to 
expand the National assessment of educational progress, to provide 
state-by-state data, and we have been working on other areas in 
educational statistics to develop a sound useful information ba*^e in 
education so we can gauge our progress. 

We are seeing that as the stakes for education increase, it's be- 
coming all the more imperative to base important decision on 
better educational tests. There are large serious problems with past 
testmg practices. Tests that are widely used in the United States 
are aimed at low-level skills which send the wrong message to 
teachers and students. 

There is an over-reliance on multiple choice-type items because 
they are convenient to use and efficient, but which are not effective 
in measuring important educational skills. In the report, there are 
testing practices, including the way tests are used in accountability 
systems that may result in unfortunate consequences for the educa- 
tion system. We need to avoid those. We need to head those off. 

But it is ovr belief that the solution to this problem is better test- 
ing practice and that better testing practice is within our grasp. 
It's not beyond our grasp several years down the road. There are 
examples right now of several states and local school systems who 
have developed and are using tests which are much better than 
previously available. 

The States of California, Connecticut, Vermont and the State of 
New York are administc*'ing performance tests in education and 
serving as a model to other states and school systems around the 
country. Vermont has committed itself to building educational ac- 
countability entirely on a portfolio concept of educational measure- 
ment. 

The City of Pittsburgh and the City of Portland, Oregon are ex- 
amples of local school districts who have /eloped good testing 
programs where the quality of instruction is not distorted or com- 



ERIC 



47 



promised to fit a test. On the contrary, the tests are developed to 
emphasize and support desirable ends in instruction. 

We believe that the efforts that we have completed with Nation- 
al Assessment of Educational Progress in order to make it a suita- 
ble instrument for state-by-state comparisons are also transforming 
NAEP into a test representing good assessment practice. The task 
of deciding on the content to be assessed in NAEP as H became 
state-by-state was assigned to the Council of Chief State School Of- 
ficers in concert with other organizations including local school 
groups, teachers' organizations, principals, and so on. 

This spring and February and March of 1990 in approximately 37 
states data were collected in eighth grade mathematics using 
NAEP. The content measured represented a substantial improve- 
ment in the testing of mathematics. 

Specifically, just to give one example, the emphasis on problem- 
solving skills in math instruction and the extent to \yhich math 
tests address problem-solving skills had been minimal in the past. 
In setting the content for this assessment, we felt it was important 
to emphasize higher order skills in mathematics, and, therefore, it 
was stipulated that even at the fourth grade level approximately 30 
percent of the exercises be dedicated to measuring kids' problem- 
solving abilities. Essentially, new exercises had to be written by the 
National Assessment of Educational Progress to do thai, and they 
were. 

For the 1992 assessment in reading, which will be done on a 
state-by-state basis at the fourth grade level, we have incorporated 
a number of features to make the assessment technology more ap- 
propriate to guiding instruction. In the first place, the items and 
exercises that we have specified tap higher order cognitive abilities 
in reading than typical assessments in the past have tapped. 

The students read longer authentic reading passages so that the 
test is more demanding and more representative of real world and 
academic reading tasks. Perf'^rmance from the assessment will be 
reported in multiple scales or multiple forms reflecting the differ- 
ent purposes and kinds of reading that students do and providing 
information to educators that s more sophisticated and relevant to 
their planning purposes than a single scale would be. 

In this assessment, over 40 percent of the students response time 
will be dedicated to open-ended responses. The reason for this is 
that we set objectives or had goals and ambitions for this assess- 
ment that simply were not amenable to multiple choice testing. 

The kinds of skills that are being measured here are tapping the 
students' global understanding of a passage which is best done in 
their words and tapping students' ability to evaluate and judge and 
express their opinion about a reading passage such as an editorial 
which is necessary to capture the extent to which you re training 
people who can be critical readers for citizenship and intellectual 
development. That, too, has to be done through an open-ended 
format because giving an argument to somebody in somebody else s 
words just won't do it. 

We're also trying out new methods in this assessment that are 
truly innovative and not widespread in other assessment programs 
and using, therefore, NAEP as a test bed for innovation in educa- 
tional assessment. This 1992 reading assessment will use a portfolio 



ERLC 



r ' 



48 



methcxl to capture kids' reading activities and classroom reading 
activities and it will also include an oral reading task allowing us 
to look at the relationship between oral language development and 
reading for the first time— an important aspect for reading educa- 
tors. 

It is our belief that we have to have accountability information 
in education. To head off its ills and misuses, we have to base ac- 
countability decisions on better tests. We feel that such tests are 
within our grasp, but we have to invest in the near future in devel- 
oping and using these tests to avoid a continuing over-reliance on 
outdated and flawed testing technology. 

I might add that we might also work toward coordinating and 
knitting together local, strte, national and international assess- 
ment programs so that the ^t testing technology is shared among 
them and so that the least amount of student time results in the 
most amount of useful information for educational decision-makers 
at every level. 

Thanl you. 

[The prepared statement of Ramsay Selden follows] 



ERLC 



49 



STATEMENT BY RAMSAY W. SELDEN 
CQUNCIL OF CHIEF STATE SCHOOL OFFICERS 
BEFORE THE SUBCOMMTIT EE ON El JiM KNTARY. SECONDARY 
AND VOCATIONAL EDUCATION 
COMMITTEE ON EDUCATION AND LABOR 
JUNE 7. 1990 



Mr. Chairman and members of the Subcommittee, I am pleased to be here to address 
this important topic of assessment. I am Ramsay Sclden. I direct the State Education 
Assessment Center at the Council of Chief State School Officers. 

The Council of Chief State School Officers is a professional organization representing 
the commissioners and supenntendents of instruction u\ the states and other jurisdictions. 
The Council also serves as a forum for states to work together on issues in education of 
mutual concern The State Education Assessment Center spearheads Council efforts to 
improve the information base in education 

Importance of Assessment Information 

It is absolutely crucial that our society have sound, useful information on the 
performance of the education system. It is necessary for us to know how we are doing, and 
systematic collection of performance data is a major piece of the information needed 

The Council has been at the foref*'ont of efforts to provide better achievement data 
since 1984, when we adopted a dramatic new policy underscoring the responsibility of states 
to contribute to a better information base and confirming the value of valid and constructive 
achievement information. The Council has been heavily responsible for the expansion of 
the National Assessment of Educational Progress, having identified NAEP as the 
appropriate mechamsm for collecting comparative data on the states, having developed the 
objectives for the assessment in mathematics and reading as NAEP goes state by state, and 
supporting the 1988 reauthorization for the expansion of NAEP 

Problems with As sessment Methods 

As the stakes for states, local school systems, students and teachers mcrease as a 
result of our interest in performance information and accountability, it is all the more 
important to base educational decision*, on sound tests Past testing practice has been 
plagued by major problems: 

o overdependence on limited item formats, especially multiple choice; 

o overemphasis on lower-level instructional skills, and 



I 




50 



o poor testing practices, including inappropriate preparation for tests, over 
interpretation of test results, and basing too many important judgements and 
decisions on too little information or the wrong information &om tests. 

Better tests are part of the solution to these problenis, and better testing practice is 
within our grasp. It is not just a vague, unattainable ideal Better tests would: 

o use a bruader array of more creative exercise formats to do a better job of tapping 
student performance. 

o emphasize deeper subject-matter content and more sophisticated reasoning in those 
subject areas; and 

o be used in ways that avoid inappropriate teaching to tests and inappropnate decisions 
or judgements based on tests. 

That better testing is within our grasp is illustrated by the following examples: 

0 Large-scale testing programs using integrated tasks, performance items, or portfolio 
methods have been developed by the Pittsburgh schools, the states of New York, 
Connecticut, California, and Vermont, the National Assessment of Educational 
Progress, and the International Association for the Evaluation of Educational 
Achievement (lEA). 

o The National Assessment of Educational Progresf in mathematics this year included, 
as a result of the specifications we developed, a 30% emphasis on problem solving 
skills, open-ended item formats, and new sections using calculators. 

o In 1992, following objectives and specifications we developed for the National 
Assessment Governing Board, the NAEP reading assessment will: 

address new, higher-order cognitive abilities in reading, 

include longer, authentic reading passages; 

report performance in multiple scales corresponding to different kinds of 
classroom and real-\^urld reading situations; 

contain 40% of student response time in open-ended items, asking students 
to respond to reading questions in their own terms, and 

try new assessment methods: an oral reading fluency measure, a portfolio 
approach to readmg assessment, measures of students' ability to d'rect their 
own reading skills, and an index of reading activities. 



2 



ERIC 



51 



The assessment will also provide information on how different student groups are 
doing, and it will give information on reading instruction methods and resources, so 
states can begin to determine what may be causing their reading problems or 
successes. 

Responsible Assevtment and Accountability 

The American educational system must have accountability information. To head 
off its misuses, we must base conclusions on better tests. But, we must invest m the 
development of better tests and in better use of tests by educators, to avoid a continuing 
overreliance on flawed and outdated methods of assessment and misuses of tests bv school 
systems. 

We might also work to coordinate and knit together our vai'ous assessment programs 
at the school, local, state, national, and international levels. This way, the best technology 
can be shared, and with the least amount of intrusion on student iimv* we can gain the most 
useful information about their performance for each level. We are working this summer for 
the National Assessment Governing Board on recommendations on this area. 

We appreciate the opportunity to comment on these important issues, and I will be 
happy to respond to your questions or comments 



3 



52 



Chairman Hawkins. Thank you very much. The Chair would 
like to open up with several questions. 

Mr. Selden, you have indicated the manner in which you think 
state-by-state testing would be used. If we begin with a test that 
doesn't really indicate the potential of an individual to perform, 
whether it's on the job or in education, and then try through state- 
by-state comparisons to ascertain j"st where we are, aren't we 
building in a false system of actually assesiing just where educa- 
tion is today? Wouldn't it be rather useless ^ o talk about achieving 
goals when they're based on such methods of assessing just where 
students are and whether or not they are going to be first in math 
and science by the year 2000, or whether or not they are going to 
be ready for school? 

They may, by all of the formal test results indicate that they are 
ready, and yet not actually be ready and aren't we somehow kid- 
ding ourselves? 

Mr. Selden. No. You're absolutely right. We cannot base state- 
by-state comparisons on tests that do not tap students' full poten- 
tial to develop themselves intellectually. The setting of t\ ise objec- 
tives for these state-by-state testing programs in the National as- 
sessment— the one conducted this year in mathematics and the ex- 
pansion in 1992— consciously set the content of the test slightly 
ahead of where we know instruction is so we can tap our level of 
performance in terms of where we want to go not where we are 
now. 

I think the inclusion of relatively a substantial amount of prob- 
lem-solving exercises in this 1990 math assessment is a good exam- 
ple. That's beyond the attention given those skills in instruction 
right now, but we believe that's the direction that instruction 
should go so we have to take the ceiling off. We also have to send a 
message to stu lints and teachers that these skills are important 
and the tests are a very important vehicle for sending that mes- 
sage. 

I might add, though, that we can t change instruction only by 
having tests that serve as a carrot out in front of the wagon. We 
also have to give teachers and school systems support to show them 
how to expand instruction and move it forward toward more ambi- 
tious objectives too. The teachers have to be shown how move 
toward these objectives. 

Chairman Hawkins. Well, let's say you are administering a test 
in reading. You may have individuals who are just not good read- 
ers, but they may be good performers on the job or may otherwise 
be good in their classes— but they just happen not to be good read- 
ers. So they flunk that test or they are very low Then you begin to 
track them into some other lower grade educational experience as 
a result thereof, and the simple fact is they're just not good read- 
ers. 

Mr. Selden. Well, I think that it's important to 

Chairman Hawkins. They may have linguistic problems, for ex- 
ample, or cultural problems about reading. 

Mr. Selden. I think it's important to recognize what Dr. Haney 
said about using tests to make individual decisions about students. 
I agree completely with him that no single test should be used to 
ma e an important decision about a student that affects their 



ERIC 



53 



future or t'neir fate but that much more information should be 
brought into that decision. 

Chairman Hawkins. But aren't we doing it though? Aren't we 
nevertheless relying very heavily on tests despite the fart that we 
don't want to oven challenge them and educational progress is 
being measured 'hat way year after year? 

Mr. Selhen. Well, again, I think there's a distinction between 
tests that lap an overall level of performance so we can see how 
the system is doing. We want to know how well kids are learning 
to read in the elementary and secondary education sj tem and we 
nee<r "^ests that can tell us that. But at the same time, Dr. Haney is 
right. We should not be using any single reading test to track stu- 
dents or to set their fate for a number of years in school process. 
That's a practice that reivlly has to be curtailed. 

Chairman Hawkins. I guess the important thing is how are we 
going to curtail it s nd prepare individuals for the future? We know 
good and well that 85 percent of those entering the labor market 
between now and the >ear 2000 will be minorities, immigrants and 
others who have gia»«> cultural and linguistic and many other dif- 
ferences than the majo.-ity of the students. 

Yet they're going tc be tested out of current programs because 
they're failing— they aren't going to get into college. They may get 
a high school certificate if they get that far— they may be discour- 
aged by the tests prior to that time and drop out. Yet these are the 
ones that we're concerned about in terms of developing their tal- 
ents. They may make a good technician in industry but they aren't 
going to get there because of our testing syf'em discriminating 
against them. 

Mr. Selden. I agree with you completely. Let me point out again 
that the tests that are being administered state by state for the Na- 
tional Assessment of Educational Progress in reading in 1992 will 
not be reported out on an individual basis. They will not be used to 
report individual scores. 

Most of the tests used in state and local school systems originally 
were selected to determine the performance of the system as a 
whole. They were intended to be program evaluation devices. I 
think what's happened is that in school practice, principals, teach- 
ers and other administrators have begun using performance on 
those tests to make individual student instructional decisions. 

Decisions are based on readiness tests administered at the kin- 
dergarten or first grade level and then decisions are made as soon 
as results become available for students on individual achievement 
tests. I would agree that that is an improper use of test to the 
extent that kids are classified based on one score. We know those 
tests are not up to that job. 

One other point here is that we have been frustrated that the 
profession of reading education has really absented itself from the 
discussion of how we ought to develop assessment techniques in 
education in reading. I think that the problem that you're talking 
about really calls for reading education professionals to develop a 
set of recommendations, recommended practices on the kinds of 
tests that should be used for student diagnosis and the ways in 
which they should be used because this is a big problem, a big void 



ERLC 



54 



that hasn't been filled, and we really ought to be able to look to 
that profession to do it. 

Chairman Hawkins. Well, my understanding is tnat the Nation- 
al Assessment of Educational Progress is coming out in the fall of 
the year with a new system or with a refined system. They are 
going to classify achievement into three classes — basic, proficient, 
and advanced. 

Weil, I can tell you now who's going to be in the advanced class- 
es. 1 can tell you now who's going to be in the lowest classification. 
It's going to be 35 or 40 percent of the students of this country. Yet 
they are going to be classified into the lowest class and they are 
going to have difficulties overcoming the stigma attached to that 
low classification. 

Now, I think this is a problem for the Nation. I'm not trying to 
zero in on the Chief State School Officers. It's a problem for the 
Nation because here you have an organization, although it may be 
doing a reasonably good job, it depends for its existence on the De- 
partment of Education. If the Department of Education desires a 
certain particular outcome, that organization is going to continue 
to be funded. 

If it differs with its creator in this instance, it isn't going to get 
the funding. Yet it relies heavily not on other alternative measures 
but primarily on a standardized testing system. And next they are 
going to come out, presumably, with national standards. 

Mr. Selden. Well, I take issue with the professional dependence 
of the Council on the Department of Education. We receive funding 
from dues from our member states. We receive funding from sever- 
al private foundations. My center is heavily funded by the National 
Science Foundation. So the Education Department is only one of 
several sponsors to the Council. We could live and survive and 
would continue to operate without their support. 

In replanning the National Assessment of Education Progress, 
we absolutely insisted that we would be able to come up with rec- 
ommendations for the best way to do that without any pressure or 
constraints from the Department as to the kinds of recommenda- 
tions that we would come up with. 

Chairman Hawkins. Well, the state departments depend on the 
Department of Education also for funding. 

Mr. Selden. Excuse me. 

Chairman Hawkins. I said state departments of education 
depend on the Federal Department of Education also. 

Mr. Selden. Well, I think that's true, but that's a whole other 
matter. I want 

Chairman Hawkins. I haven't seen one yet that wasn't trying to 
get as much of the Federal money as they possibly ran 

Mr. Selden. I think that that's a topic for another hearing, i do 
want to 

Chairman Hawkins. May I simply say that I'm not accusing you 
of any wrongdoing and Tm not in anyway saying that there's some- 
thing evil or sinister in the operation. It's just human nature. It 
just seems to me that if everyone profits in a broad sense from a 
system that protects those who are in the system now and doesn't 
challenge the system, that you are not going to get what we want 
in terms of achieving the National goals. The teacher and the class- 



id 

ERLC 



55 



room are going to be judged by the extent to which results are ob- 
tained according to accepted national standards. 

I suppose it gets down to how can we get the independent profes- 
sional testing system that will be independent of these deficiencies 
that are built in. Who is going to do it in a professional way and 
not be dependent on its source of income from some political ideolo- 
gy? I suppose that s the real problem. I don't have the answer. I 
don't know what we're going to do about it? 

Mr. Selden. ril make the recommendations that we made from 
the 1992 reading assessment available to your staff, and they can 
review those for their political independence from the Department. 

I think we have developed a concept of reading here which is in- 
tensely challenging to the U.S. education system, puts it on the 
line to see how it's performing and is not necessarily in the best 
interest of states or the Federal Government in terms of trying to 
make the current system look good. 

I think in terms of the results that you anticipate coming out of 
this 1992 assessment, you're right. There are a lot of kids who don't 
achieve well in the school system right now and unless a miracle 
occurs in the next two years, they're not going to be achieving 
much better by then. 

But let me remind you that this assessment is going to be taken 
on a sample of 2,000 kids in each state in the fourth grade level. 
That's about ten kids in each of 200 schools or 20 kids in each of 
100 schools, depending on how it's done. 

None of those kids v/ill be given an individual score. None of 
those kids will be stigmatized by taking the National assessment in 
reading. No school will be stigmatized because the schools are part 
of a state level sample, the kids in the school are not representa- 
tive of the school itself. So the school won't even be reported out 
and stigmatized so the teachers in that particular can't be stigma- 
tized. Instead, we're going to get information on the relative per- 
formance, the representative performance of each state. That's 
going to put pressure on the system as a whole to deal with the 
weak spots in the educational program. 

And yes, disadvantaged and minority kids are probably going to 
do poor on this test, and it's my sincere hope and intention that 
the comparative testing will stimulate and result in curriculum 
specialists at the state and local level, in teachers, in legislators, in 
policymakers looking at their practices and trying to identify effec- 
tive ways of doing a more effective of teaching reading to kids who 
do not do well now. 

Chairman Hawkins. Well, I hope you're right. But I predict that 
in five years, everybody's going to say we're doing wonderful and 
that we ve progressed and that the students are learning and every 
state is going to maintain that it's doing much better than the av- 
erage et cetera. 

Then we will find out eventually that in comparison, students in 
other countries aren't doing as well as we think we are doing be- 
cause we've been mislead by false results in assessing the progress 
of students. That's what I fear. I hope it doesn't happen, but that's 
what I see. 

I was encouraged by many of the statements in the Report of the 
National Commission on Testing and Public Policy. I think it's in- 



ERLC 



56 



dicated by the fact that I marked up my book so much that there 
isn't very much left to mark up. 

But I think they ve pointed out the very dangerous situation that 
we have now that we built into the system, and that the tests are 
actually going to have a very negative impact on instruction and 
that we are still going to rate a certain percentage of kids as fail- 
ing. Yet, a lot of those kids with a little help — if the test didn't mis- 
lead us — with a little help, could be very good. That's very, very 
true, I think, in the inner cities particularly where we have a lot of 
immigrants and a lot of kids dropoing out of school because they 
become discouraged. 

So I hope that my pessimistic outlook is going to be improved be- 
cause we have experts like you who are here with us today suggest- 
ing some alternatives and being able to use those alternatives as 
soon as possible. 

Mr. Selden. Well, let me pick up on something that Mr. Haney 
said, and that is that we can't used tests that are biased in the ef- 
fects on students by virtue of their characteristics— their social, 
their cultural, their economic characteristics. 

One of the problems with old style tests is that they reliance cn 
the multiple choice methodology and other kinds of test ques- 
tions—these are things that the studies on bias in testing indicate 
that middle class and upper middle class kids are much more com- 
fortable with. Whether they get coached in how to do these things 
or whether they have more experience with them, that's one of the 
explanations for group differences on standardized tests. 

One of the reasons that we're recommending that 40 percent of 
this NAEP reading assessment be open-ended is so that we can tap 
a kid's understanding in their own terms. So that regardk s of how 
the kid expresses himself or herself, when we ask them what does 
this story mean, we can use their words to make a judgment of 
what they have learned and what they have understood. 

That is, I think, a critical breakthrough in getting around the 
kind of bias built into unsatisfactory test formats in the past. I 
think it will be especially important for cultural, economic and lin- 
guistic minority kids who have been showing the worst difficulties 
in traditional test performance. 

Chairman Hawkins. Well, thank you. Mr. Goodling. 

Mr. GooDUNG. Thank you, Mr. Chairman. I have a lot of the 
same concerns that the Chairman has. I guess in our drive to insist 
on excellence, and that's what we are trying to do here in the Con- 
gress—instead of just passing pieces of legislation, we're trying to 
determine how well they're doing and t'-ying to emphasize excel- 
lence—my concern is that as we do that we may not have the 
proper tools to determine whether there is excellence or not as a 
result of our program. 

I always, as an educator, insisted that the teachers use tep*s pri- 
marily for one purpose and that was to determine wheie the 
youngsters were doing poorly and then do something about it. I 
would hope that we would continue to emphasize that part of test- 
ing. 

My secretary of education in the state from which I come had 
the great idea that he used tests to rank schools. His purpose for 
using tests certainly was not fine because he then published a list 




57 



of how the schools were ranked throughout the state which, of 
course, was utterly ridiculous. 

He had the great idea that Upper St. Clair was number one in 
the state. Upper St. Clair should have been one. All the parents 
are Ph.D.s. They have more money to spend than Carter has liver 
pills on education. Now, if you take test scores from that area and 
compare them with those oi a school district with a very small 
taxing base, that is totally unfair. 

So my hope is that whatever you design in the future, it will first 
of all be uised strictly to help students improve whatever their 
weaknesses are and not to rank and rate because I think that's a 
misuse of tests. 

I don't really have any questions for any of you, just a hope that 
you will continue doing whatever you can to make sure that tests 
are worthwhile, effective, and measure whatever it is we're trying 
to measure. 

Dr. Faldet. Mr. Chairman? 

Mr. Sawyer. Yes. 

Dr. Faldet. If I may, I would like to make a comment or two rel- 
ative to some of the testimony that was given in response to Mr. 
Hawkins' question. May I be permitted to do that? 

Mr. Sawyer. Surely. 

Mr. Faldet. One of the statement was made that the fate of a 
student who scores low on a standardized test— and I'm not speak- 
ing of the National assessment type of test, but rather those that 
are given in most schools at least once a year— the fate of that stu- 
dent is somehow to doom him or her to perpetual educational no 
man's land. 

That is not the appropriate fate for that student. What is the ap- 
propriate fate is that based on information from a multiple choice, 
non-re^'erenced test used diagnostically is to take that student and 
do something with him or her, or the groups, that is different than 
has been done in the past. 

Indeed— if I may relate just one success story among all the fail- 
ures we hear about— In 1971, that's a long time ago, a norming was 
done on one of the major standardized tests and those norms were 
used for the next seven years. Indeed, in schools using that test, 
the average of each school began to improve. Now, you could say 
it's because they familiar with the test or they were teaching to the 
test. 

In 1978, a test was re-normed. We went out and got new students 
who had never seen that test before. And indeed, in grades kinder- 
garten, one, two and three, performance had increased. Now this 
was good news, except that when you apply that new standard now 
to the schools that have been using the test, the percentile dropped. 

What brought about this change as we interpreted it? Well, one 
critical factor was the Chapter One funding that had been going on 
because this happened in those very areas where efforts had been 
made to assist those students who indeed where low-scoring stu- 
dents. 

So J would argue that means exist today within the schools and 
wiwhin current testing policy and practice to make a difference 
based on those scores. I think that when the scores, however, 
become so embroiled in the political arena, it does indeed take 



ERiC 



58 



away from the teacher the motivation to do the right thing rather 
than to teach to the test. 
Thank you. 

Mr. Sawyer. Mr. Hayes. 

Mr. Hayes. Thank you, Mr. Chairman. Let me first apologize for 
my getcing here a little late. As you know, it's common practice 
around here to do two things at the same time. Therefore, some- 
thing has to suffer. 

I have one question, I guess, directed towards Dr. Haney. You 
are here representing the National Commission on Testing 

Dr. Haney. Yes, sir. 

Mr. Hayes, [continuing] and Public Policy. After three years of 
study, the National Commission is now criticizing the use of stand- 
ardized testing. I believe, too, that standardized testing has been 
used to weed out people of opportunities. The Commission con- 
cludes that under no circumstances should individuals be denied a 
job or college admission exclusively based on test scores. 

Now, my question is, could you elaborate on what other factors 
can be taken into consideration beside test scores for, let's say, en- 
trance into an institution of higher education? 

Dr. Haney. Yes, sir. A prime example in that case would be a 
student's previous academic record concerning, for example, both 
the kind of courses they have taken and the grades that they have 
received in them. 

There are certainly problems with grading practices in our sec- 
ondary schools across the Nation. But quite consistently over a 
period of 50 years research has shown that students' high school 
record actually predicts their subsequent performance in college 
better than standardized college admissions tests. Moreover, evi- 
dence clearly shows that if you did rely more heavily on students' 
academic record in high school for college admissions than on 
standardized college admissions tests, that would result in less ad- 
verse impact on groups of individuals who tend to be particularly 
adversely affected by decisions based solely on standardized Lcs^ re- 
sults, such as, individuals from African-American backgrounds, in- 
dividuals from Hispanic backgrounds, individuals from poor socio- 
economic status homes. 

That's a very practical example with regard to the question you 
raised, sir. 

Mr. Hayes. Can you maybe speak just briefly to the needs of the 
year 2000 labor market? How the current use of standardize! test- 
ing may negatively impact on our readiness for competition which 
we have so often alluded to? 

Dr. Haney. Yes, sir. I can give you some concrete examples that 
were brought to the attention of the Commission and then try to 
speak to you briefly about what the Commission recommended to 
try and to remedy those kinds of problems. 

One example. When the U.S. Employment Service was trying to 
development a new referral system that wouldn't be as expensive, 
they startc?d experimenting with a referral system for people whc 
go to the Employment Service looking tor jobs which would be 
based exclusively on a test called the General Aptitude Test Bat- 
tery. It had been used for employment referrals based on some 
theories that I won't try to recap here. 



ERIC 



59 



But what they found was that in some communities when they 
started placing exclusive evidence on this standardize test for the 
purpose of employment referrals, some groups of individuals — dis- 
advantaged individuals, unemployed— simply stopped using the 
Employment Service because they felt they couldn't get a fair 
shake on this test. 

So the use of the test was clearly undermining a prime objective 
of the Employment Service to try help place people who are unem- 
ployed in job. That's just one example of how over-reliance on 
standardized employment tests can undermine vital employment 
policies. 

That kind of issue is going to be increasingly important because, 
as I believe Chairman Hawkins alluded to, we re going to be 
having vastly increasing proportions of the entry-level work force 
in the next ten years composed of minorities and women than has 
been true in the last 90 years. 

Now, that suggests to the Commission that what we've got to do 
is to try to avoid, not just an education, but in employment selec- 
tion practices, ovei-reliance on just one form of evidence. There's 
been considerable research, particularly on employment assess- 
ment, that relying on alternative kinds of assessments than stand- 
ardized tests reveals not only equal validity with standardized test 
results but also smaller adverse impact on the sorts of groups that 
have historically been disadvantaged in our employment system in 
the past. 

Our full report does point you to some of the examples of those 
kinds of alternative assessments and the evidence concerning their 
validity and lesser adverse impact. 

Mr. Hayes. Thank you. Does anyone of the other members of the 
panel want to comment on that question looking ahead to the year 
2000— our readiness, so far as the labor market is concerned? Are 
you satisfied with the response that I received from Dr. Haney? 

Mr. Selden. Well, I would add that apart from the employment 
identification or screening practices which Dr. Haney s addressed, 
having a competitive work force in the year 2000 also depends on 
having an effective school system. Having an effective school 
system in my mind hinges in part on having valid, useful informa- 
tion on how kids are learning. 

That's going to require better tests and better use of tests be- 
tween now and 2000 and monitoring the system to make sure we're 
getting enough students who can do what our labor force needs and 
to make sure that all students have an equal opportunity to pros- 
per in the education system and to join that labor market. 

Mr. Hayes. I know I've exhausted my time, Mr. Chairman, so go 
ahead. Thank you very much. 

Mr. Sawyer. Thank you. Mr. Petri. 

Mr. Petri. Thank you. Thank you. gentlemen, for coming here 
today. 

When you talk about tests, are you talking about testing for 
knowledge or ability? I mean, what is it that these tests are de- 
signed for? There's a difference. 

Dr. Faldet. May I respond to that. 

Mr. Petri. Yes, because there's a difference between testing what 
people know and their ability, and it would seem to me that one of 



60 



the objects would be to see if there is a difference between a per- 
son's level of knowledge and his level of ability. This is a separate 
question from the fact that when people take any test there's a 
continuum and some people do well and others do badly. That's in 
the nature of things. If everyone does 100, forget it. But if there's a 
big gap between ability and performance, for example, then the 
test can maybe show that, and you have a person who has a poten- 
tial that has not been realized and we ought to do something about 
it. 

If the pert n's ability and performance are both lousy, well, 
okay. Or if th^ re both great, okay. But when there's a difference 
between the twu levels, then the test has revealed something that 
shows that we're not helping people. And that seems to me the 
value of tests, roally — not to disguise differences between people or 
point them out or anything else. 

Dr. Faldet. I would like to respond to that just a bit because I 
would like to remind the committee that we're talking about sever- 
al different kinds and levels of tests here today. There are those 
tests for college admissions, the SAT, the ACT, the National As- 
sessment sampling type of testing in terms of the people that are 
contributing to the data. And we're talking about employment test- 
ing. 

But I think one of the key problems that all of us recognize is 
that testing which goes on in elementary— particularly elementa- 
ry—and secondary classrooms. It's that testing that is not a pass/ 
fail kind of situation. It is indeed on a continuum. But within that 
time spent in testing, which may be a day and a half of time, you 
get potentially a tremendous amount of information. And that can 
be both with respect to some measurable abilities as well as some 
measurable current levels of skill. 

It's that kind of information which needs to be acted on. You 
want to make sure that you use test information in instruction. I 
think we get very fuzzy when we are afraid to use test information 
about an individual to say I need to do something differently with 
you, or with you two or three students because you do not yet have 
the basic skills that will permit you to do reading problem-solving 
at some later point. 

We very recently have had a great deal of concern about back to 
the basics, and have we forgotten that? Are we no longer concerned 
about those things which are commonly measured in standardized 
tests? I would invite the committee to look at any elementary test 
battery and ask yourself what is it that is being measured here 
that I really -ni not concerned about students knowing or having 
been acquainted with. 

If indeed we are abominably failing in teaching these things, as 
shown by standardized tests, or even if we are teaching to them, I 
think the problem needs to be addressed in addition to those prob- 
lems where misuse of test results for political reasons, pass/fail, 
employment decisions, are also an issue. But it is a different kind 
of issue, and I would not like to see them mixed. 

There's a lot of talk about how we are going to be prepared as a 
country economically. But it seems to me as a government, we 
have another big concern and that is that we have a literate com- 
munity of people prepared to exercise the responsibilities of citizen- 



id 

ERLC 



61 



ship in a free society. This was always one of the great rationales 
for having public support of universal education, so that everyone 
would have an opportunity at least to learn how to read and write, 
read the newspaper and follow events, and contribute to a demo- 
cratic society. That's the basic rationale for public education in my 
mind anyway. 

Businesses and other j^eople will be able to figure out ways at the 
end of the day to impart skills so people can be productive, but 
they don*t necessarily have an interest in preparing individucJs for 
the responsibilities of civic participation. 

In that connection, we each think of our own background, I sup- 
pose. I remember a teacher I had who would make us get a score of 
90 or above when we were seniors in high school on the parts of 
the English language such as adverbs, nouns, verbs and so on. If 
you got below 90, as far as she was concerned, you came after 
school for an hour. The purpose of the test was to raise everyone in 
the class to a certain basic level. She didn't feel you should gradu- 
ate from high school if you did not understand the basics of the 
English language and how to put a sentence together and all that 
sort of thing. 

So people were just given extra instruction and they kept on 
coming in until they all got above 90, even if it took a week or a 
month or two months on a particular part of the exam — and there 
must have been about 50 different exams that we had to take in 
the course of the year. 

So it seems to me that rather than one snapshot of someone's 
performance, the test can be used as a guide for helping people to 
reach at least a basic level of competence which we want to encour- 
age and expect all p)eople to have if they're to be participating citi- 
zens. Is that at all a valid approach, or is that a waste of our time? 

Dr. Faldet. No. I think that is what testing in the schools is all 
about. It is a first step in early identification hopefully of students 
that are having some difficulties in some areas that are agreed by 
the school and generally, I think, nationally are important things. 
Things that are basic to subsequent learning. 

They are not the end of what should happen because there 
should be confirmation. I think there are students that have bad 
da>pS on a standardized test or a custodian is mowing the law right 
outside the window. Yes, that can happen, obviously. But you first 
of all confirm and say, yes, this is consistent with what behaviors 
I've observed and now T have some evidence to enables me to con- 
firm mv thoughts and we're going to do something about this. 

That s were you develop then, as a part of a total assessment 
evaluation plan, what your next steps are. I think that is key to 
any, if you will, guideline*? that you would put out in terms of use 
of test information. It's got to be in the context of it being a con- 
tinuing thing and that actions are taken as a result of it. 

Mr. Sawyer. Let me interject, if you don't mind. We've been 
talking about a couple of different fundamental differences in the 
use of testing. One of these is the notion of the gatekeeper, the 
portal through which everyone must pass— a right of passage ap- 
proach to testing. The one that is far more complex and useful in 
the longer run is testing early and often for diagnosis and then tar- 
geted remediation. 





62 



The frustration that I have is that as we talk about gateway pro- 
cedures, we speak to the inadequacy of current testing procedures 
to be used and the consequence of stigmatizing whole population 
segments, schools and school systems, and individuals, is a matter 
o? deep concern for all of us. If these instruments are inadequate 
for that broader less precise purpose how do we move them as tools 
into the realm of early diagnosis and targeted remediation which 
Tm sure we can agree is a preferable approach? 

Dr. Haney. Could I suggest a very quick answer to that by 
coming back to Mr. Petri and asking a qu^^tion about your teach- 
er—your high school teacher who did this testing with regard to 
grammar. Did she— after you took the test what happened? Did she 
give it back to you? 

Mr. Petri. The ones that we got wrong we were told al:>GUt, and 
then we had to take another test the next day. 

Dr. Haney. Right now because 

Mr. Petri. She d give it back to us. Sure. 

Dr. Haney. She gave it back to you. 

Mr. Petri. Yes. 

Dr. Haney. Right now because of the nature of most testing tech- 
nology, including the commercially published standardized tests 
that Dr. Faldet was talking about and the National Assessment in- 
struments, which I think on many counts are quite good, that does 
not happen because you cannot give students immediate and de- 
tailed feedback on what they learned without invalidating those 
items tor future reuse. 

Thus, when the schools have been under a lot of pressure to im- 
prove test scores, you find exactly the kind of Lake Woeb^one 
effect that our third witness talked about today. I am sorry, it's 
Doctor— or Mister? 

Mr. Faithorn. Faithorn. 

Dr. Haney. Mr. Faithorn spoke about with regard to his col- 
league's Dr. Canneirs useful work. 

So that I think we have to be careful, number one, to distinguish 
between different kinds of testing for the purposes of aiiocating op- 
portunities at m^or transition points in peoples education and em- 
ployment careers. They are really instructionally aimed. For in- 
structional purposes, you want them related to what's being 
taught, you want to be able to provide feedback and so you are 
talking about fundamentally different kind of testing right now. 

Mr. Sawyer. Well, not necessarily. We're talking about test in- 
struments that test kids in fourth, eighth, twelfth grade. To get to 
the twelfth grade, or maybe even eighth, they become rights of pas- 
sage. But even if you wait until the fourth grade, it's too late to do 
the kinds of things you*re talking about. 

Dr. Haney. It's too late. 

Mr. Sawyer. And that real diagnostic instruments need to come 
early and often and became the kind of tools that will help teach- 
ers target their efforts and approaches. Perhaps that strikes people 
as a little brick schoolhouse approach, but it worked for a long 
time. 

Dr. Haney. Well, I think that some new testing technology is 
going to make that more possible so that vou can give people de- 
tailed results without invalidating the test for future use. 



ERIC 



63 



Mr. Sawyer. Let me move to my second question then because in 
doing this we've talked a great deal about changing the structure 
of toting. As we move to particularly more open-ended answers, 
and answers that require subjective analysis of a response, how do 
we go about sumdardizing the quality of evaluation? Has there 
been much thoug!n given to that sort thing, gentlemen? 

Mr. Selden. Can I respond? I think that we are demonstrating in 
tests that are used on a wide scale basis that open-ended responses 
can be scored validly, accurately and reliably. 

It think the best example is in writing, we used to have multiple 
choice writing tests in education and now a number of states and a 
Int of local school districts and I think there are several commer* 
ciad 

Mr. Faldet. And publishers 

Mr. Sawyer. Well, my friends who take the bar grumble about 
that all the time. I don't know, I've ucver done it. 

Mr. Selden. It's a matter of setting criteria. Given how a person 
responds, you have to preset criteria for what yoa are going to 
deem an acceptable or unacceptable response and then people can 
be trained to score the responses and judge whether they re correct 
or acceptable or high in quality and low in quality. And that's 
done. 

The state of New York has a fourth grade science test where kids 
come up to a table and they actually conduct an experiment in 
order tc find out what shape an electronic circuit is. They are 
watched while they are doing this and the teacher judges whether 
or not they successfully designed an experiment and carried it out 
to do it. But it's an integrated task, it takes a certain amount of 
time for the kid to do. Many kids may do that in different ways. 
But it was administered to every fourth grader in the state of New 
York. 

Dr. Faidet. Mr. Chairman, indeed, you are right. There are cer- 
tainly available now some writing sample kinds of test together 
with keys for scoring. I think that you also find some portfolio con- 
cepts available commercially 

I think the important thing is to recognize that whatever change 
there is, you are going to have to convince and involve those local 
educators without whose commitment, understanding and support, 
whatever is legislated or developed is not going to get implemented 
appropriately, and I would suggest we're seeing that now. 

I think we were in a period when standardized where used far 
more appropriately than they are today. I don't think they were 
any less representative of the curriculum that we wanted. But 
there have been pressures that have created now high stake situa- 
tions revolving around that and I would urge you to make use of 
the expertise of those who deal daily, weekly and annually with 
schools in assistance in designing how this is going to be done and 
how you introduce it anu implement it. 

Mr. Sawyer. Thank you. Mr. Smith. 

Mr. SMrrH. Thank you, Mr. Chairman. 

Mr. Faithorn. Mr. Chairman? 

Mr. Smith. Let me, if you could because I am going to have to go 
in about 10 or 15 minutes and maybe your question will fit with 
my request, if that's all right sir. 



ERIC 



64 



We are on the verge, I think, of making the same mistake we 
make every time we talk about how to make schools better. It is an 
a point, with all due respect, of convincing or just involving teach- 
ers and school boards to use existing or new technology. It is a 
question of asking them what it is they as professionals would do. 
If we want to make diagnosis the basis of how we determine how 
much value we are adding to children's lives cognitively and in 
terms in skills and behaviors and attitudes, then, in fact, it must 
start with the school, not end with the school. 

Paulo Friere would not be welcome, in my mind, in any of our 
schoois. Yet he still has— especially for children who do not share 
the so-called dominant culture of our country— he still has for my 
money the single best philosophical instructional approach to 
teaching reading, which is to take the words of power in the cul- 
ture from which you come and use them to pull the child to learn 
how to interact verbally and in reading and writing with a culture 
that he or she is going into. 

Somehow, if we want education, which is derived from the words 
ex ducare — to lead from— to lead beyond, if thats really what 
we're serious about, we need to figure out a way to blend the social 
imperative of schools, which is a common socializing experience 
which bincvi our culture together on it's good days, and then the 
notion of excellence, which is that we maximize the capacity of 
every student. 

From my point of view, that means that every time we use a test 
simply to judge, it is an external operation determined by some- 
body else and it in fact by definition has to be destructive to the 
educational process which would be based on diagnosis and evalua- 
tion which would involved not ever> three years but hopefully 
twice a year or more cogent comments about how well a student is 
doing and what that indivic^ual knows or can do differently than he 
or she could do six months before so that parents can understand it 
and the community can understand it in relationship to what their 
goals are for the student. 

I think it speaks strongly for the idea of flexibility in our schools 
so that we could teachers — I happen to have a bill which does 
that 

[Laughter.] 

Mr. Smith, [continuing] so that we can ask teachers and boards 
how it is they would like to organize an educational program so the 
capacity of every child is maximized and can be described in real 
terms. 

Two questions. One, and these may — I think there's an economic 
plot here too. I wondered to the extent any of you have investigat- 
ed or have opinions about the connection between the textbook in- 
dustry in this country and the testers because the last time I 
looked there's big money on the table and they go down in Texas 
and California — with all due respect to some members of this com- 
mittee—involves a whole lot with how it goes down in the other 48 
states because there's a lot of kids and books. I think there's an 
unholy relationship, and if we don't understand the economic 
impact, I think consequences of reforming testing, we're never 
going to get at it. 



ERLC 



65 



Secondly, the question of whether a little diversity and how we 
evaluate and describe learning— not saying there's any one way, 
but letting states and schools go at it differently for a while until 
we find out what the good practice is and let it bubble up. 

How da you feel about diversity and how do you feel about the 
testing and text alliance in this country? 

Mr. Sawyer. Vou each have 30 seconds. 

[Laughter.] 

Dr. HANfcr. Very briefly 

M. Sawyer. I was kidding about that. 

Dr. Haney. Okay. I could very briefly, in 30 seconds indeed, say 
that I think your concern abou^. the changing nature of the test 
and textbook publishing industry is right on target because there 
has been a tremendous number of acauisitions in both the textbook 
publishing and the test publishing industry over the last IC years. I 
can't cite them off the top of my head, but I can provide you with 
some documentation of that. 

Mr. Smith. Please do. 

Dr. Haney. Not only, though, must we worry about test publica- 
tion and textbook publications, but also there now is questionable 
practice in people who publish tests, publishing test preparation 
manuals for those tests which appear to have been adopted fairly 
widely in some schools at substantial cost. So Td say that is a con- 
cern that is salient right now in light of mergers that have hap- 
pened over the last five to ten years. 

With regard to diversity, I tliink you're absolutely right. There is 
considerable evidence on the basis of assessments that have been 
made in the past and studied through research that when you start 
using different methods of assessment, you start beginning to see 
talent in different people aiid in different groups in different ways. 

There is a tension, as you alluded to, between trying to educate 
people and trying to make judgments about them. To the extent 
that we want to form educational decisions based about people and 
students— particularly young students — m context, we have to rely, 
I think, more on the nonstandardized evaluation systems that grow 
out of the local context because there has been research that shows 
that things as seemingly innocuous as to whether or not students 
have had breakfast in the morning, can significantly affect their 
standardize test results. 

There's not way that the companies that Dr. Faldet represents 
would have any way of knowing that when they score the test re- 
sults. You'd have to rely on the teachers who know the students in 
context. 

Mr. Faithom. Mr. Chairman? 

Mr. Sawyer. Mr. Faithorn. 

Mr. Fafthgrn. I'd like to respond to Mr. Smith's question and 
touch on Mr. Petri's comments also. 

Mr. Petri was talking a teacher on one end of a board and a stu- 
dent on the other— the ideal learning situation, if it's a good teach- 
er. The testir^ that we're upset about and Friends for Education 
doesn't provide any feedback — and with respect to Mr. Smith's 

Question— it involves really serious money. wTiat Mike Royko in 
Chicago would call really serious money— the cost of these stand- 
ardized tests and the textbook that goes with them and the prepa- 



O 34-661 - 90 - 4 

ERIC 



66 



ration books that go with them and the relationship between the 
publishers of these tests and the school boards. It's big business. 

With respect to Mr. Hayes' question earlier— Mr. Hayea earlier 
asked a question about going into the year 2000, what ideas did we 
have with respect Lo that» I would like to respond to that kind of 
indirectly by saying that I went around to the Department of Edu- 
cation and met with officials in their Undersecretary for Research 
Office to better prepare myself for this first time I ve every been 
before a congressional committee. 

They confirmed the fact that they had checked out our study 
about the phoniness of standardized test results and felt we were 
right. But they said that they had not checked on any of the impli- 
cations about cheating, our allegations about wholesale cheating 
that's going on in the schools to make the student look better and 
the school look better in passing these standardized tests. 

They said the reason they hadn't done that because this was an- 
ecdotal and therefore it didn't lend itself to any real verification, 
and furthermore the Congress and the school districts didn't want 
the Department of Education messing around their affairs to the 
point where the Department was examining into procedures in 
schools, where comparing state to state, or school board to school 
board. That this was a nightmare to all these i>eople and that the 
Department of Education should damn well stay out of it. 

I come to my point in answer to your question. I was appalled bv 
this and I think the Department of Education ought tc damn well 
be getting into questions liks that if we are going to do something 
between now and the year 2000 in closing the gap between our kids 
and the other Western democracies and industrial states of the 
worL^ Thank you. 

Mr. Sawyer. Mr. Poshard. 

Mr. Poshard. Thank you, Mr. Chairman. I'm sorry I got here 
late, I had some other committee meetings this morning also and 
so I didn't get to hear the original testimonies. My question to Mr. 
Faithorn is why would we even need to cheat if the test are devel- 
oped as you have explained in your testiir.ony, and I'm assuming 
you have some evidence through the study iu which you engage 
and so on» to show that they are. 

It seems to me that if the norm group is just a group that is 
tested cold and then that's compared against students who have 
studied material to take this test for a whole year and the differ- 
ences are compared, why would you need to cheat. Why wouldn't 
we come out above average on everjrthing? 

Mr. FAiTf'ORN. Well, you're quite right and Dr., this gentlemen 
on my right, explained just why the performance improves every 
year when it's compared against an old norm. There isn't really 
need for cheating but it goes on wholesale anyway. I don't know if 
you saw CBS's 60 Minutes 

Mr. Poshard. Yeah, I understand that, but that's not the ques- 
tion. I understand that there is some cheating, but I'm more inter- 
ested in the other facet here. Mr. Faldet would you elaborate just a 
little bit about Mr. Faithorn's statement of the way the norm 
group is established? I'm sorry if this has been asked already. 

Dr. Faldet. No. I don't think it has. Certainly. The goal of set- 
ting a standard for a period of time is too make sure that it's repre- 



ERIC 



67 



sentative geographically by ethnic groups, socioeconomic groups 
and so forth, large districts, medium/small districts— so some 
rather elaborate strategies and techniques are used to seek the co- 
operation of randomly selected districts throughout the United 
States in taking a test for which they will not receive any scores 
because there really isn't anything to report back to them at that 
time, and that can influence the level of motivation on the test. 

But from those studies a variety of things come. Certainly, as- 
signing the percentiles— what represents the 99th percentile, what 
represents the median— the 50th percentile by grade and semester. 
In addition, that s where you get the reliability and the beginning 
of validity studies that have to accompany each standardize test, 
but then you begin to give it to people who have not seen it before, 
but who have chosen this test hopefully because the objectives 
measured are as consistent as possible with the objectives that 
their district is emphasizing. I think that's key. Then they begin 
taking it and then indeed the scores may begin to rise. 

Now I don't know how much of that is because teachers are 
teaching to the test or teachers are indeed to continuing to empha- 
size the objectives that the test is measuring. In the latter case, I 
think it would be good. In the former, it's abominable. 

Mr. PosHARD. I'm assuming there's both pre-study test and post- 
study test. Right? You're not talking about giving this test one 
time to a group of students and establishing a norm group. Right? 
You're talking about giving the test before school, having a full 
year of school for the norm group and then testing after the year of 
school. Isn't that the way you establish the norm group? 

Dr. Faldet. Yes, sir. It's given generally in the fall and it's given 
again, probably alternative form to get some variety there, in the 
spring so that you have some pretty good data on what growth has 
gone on. 

Mr. PosHARD. Okay. Then my question is to Mr. Faithorn. In 
your testimony you described this group of students upon wh'.h 
the norm is established as taking the test cold. What do you mean 
by that if the students take it before study and after study, how are 
they taking it cold? I'm trying to collaborate data here so I under- 
stand this. Why would you say they're taking it cold if in fact they 
have spent a whole year studying the material? 

Mr. Faithorn. Well, first of all, let me apologize for the kind of 
casual and sloppy language that I used. I thought that this would 
convey the idea of what I understand goes on which is that a new 
test is developed by McGraw Hill, let's say. They give it to a group 
of students and they get the results and then those results are used 
for tl 3 next several years against which to measure subsequent 
groups of students taking the same test. 

Mr. PosHARD. But you couldn't establish a norm if you didn't 
have subsequent study after the pre-test and then a follow-up test 
to see how much the student learned. Otherwise you don't have 
any group to test it against. I mean the two groups that are tested 
have to have the same experience or else there's no validity or reli- 
ability to the test. 

Mr. Faithorn. Well, may I defer to my new friend here on my 
right to answer that question because it's his business. 



68 



Mr. PosHARD. Well, no, but Tm trying to find out what is actual- 
ly happening. You're saying they're taking it cold. When I read 
your statement, I thought they're giving this test one time to stu- 
dent and that's it and then they're going out and letting other stu- 
dents study a whole year for the material and take the test. 
There's no reliability or validity to that sort of procedure if that's 
what's occurring. 

Dr. Haney. Could I interj^jct 

Mr. PosHARD. My question is does your norm group that you're 
establishing take a pre-test, study like every other student that's 
going to receive the eventual grades on this for a full year and 
then take a post-test and you compare the results for the local dis- 
trict against that norm-referenced group? That's all I want to 
know. 

Dr. Haney. Yes. I think I can help illuminate this in that I've 
talked with Dr. Cannell, Mr. Faithorn's quick friend about this sev- 
eral times. The distinction is that when most publishers norm 
tests, they seek to develop empirical norms both in the Fall and in 
the Spring. 

They choose school systems so as to try, as Dr. Faldet explained, 
to try to have a nationally representative sample of school systems 
all across the Nation. 

Mr. PosHARD. I understand that. 

Dr. Haney. And they develop the norms from lose testings from 
both the Spring gmd the Fall. However, when they go to sell those 
tests, school district studies have shown t3i)ically select between 
the big test series on many grounds, but primarily on the basis of 
whether or not the test seems to match the local curriculum. 

So when the results are subsequently reported you are in effect 
getting results based on a self-selected group of school systems who 
may have picked that test because there's a better match between 
that test and their curriculum. But the norm group was not select- 
ed because of any such overlap between test and curriculum so in 
that sense you are talking about two quite different groups. 

Mr. PosHARD. Yes. Okay. Then I understand that fallacy 

Dr. Haney. One other sort of research finding that may interest 
you and that I think that your question was an excellent one be- 
cause while there's been a great deal of publicity to issues of cheat- 
ing as a result of some of Dr. Cannell's and the 60 Minutes pro- 
gram, a very interesting research report that came out just a few 
months ago a national survey of teachers and school administra- 
tors asking them about test practices. 

The results were treated anon3anously so the respondents had no 
reason to cheat, but the results indicated that these people — both 
teachers and admiiiistrators— perceived there to be on the order of 
10 percent or less of their colleagues who might have engaged in 
improper test preparation or what might be called cheating. 

But in fact the results indicated that more than 70 percent from 
the systems from which people responded had engaged in what has 
come to be called test curriculum alignment so that they had 
aligned their curricula to better address either the objectives cov- 
ered by the test or the actual items represented on the test. 

The problems is that they were not normed originally on schools 
whose curricula were so aligned and there is some research on the 



ERIC 



69 



ramifications of test instructional overlap on the results. Basically, 
to try to summarize a fair amount of literature very quickly, it 
shows that differences in test instructional or curriculum overlap 
could easily account for the magnitude of Lake Woebegone effect 
that Dr. Cannell found. 

Mr PosHARD. Tm sorry, Mr. Chairman. One quick question— so 
then we can be assured that the publishing companies are in effect 
carrying out correct procedures in terms of norm-referencing their 
test in regard to validity and reliability. In other words, we re not 
measuring against a pre-*3st, and a post-test. Right? 

Dr. Faldet. No, sir. 

Mr. PosHARr. Okay. That's good and I accept your explanation of 
the schools actually trying to align themselves in terms of the par- 
ticular test that they give to the students. Thank you. 

Dr. Faldet. But if I may, Mr. Chairman, if they didn't do that, I 
would be disappointed. If they found after the first administration 
of a new standardized test, that their students were woefully weak 
in some language arts skills, and they didn't align their curriculum 
to correct that situation and thus increase the scores hopefully the 
next year, I would be disappointed. They wouldn't be the doing the 
instructional job that the tests are helping them to do. 

Mr. Sawyer. Mr. Payne. 

Mr. Payne. Thank you, Mr. Chairman. I will paiis since I came in 
late and allow my colleagues to— if there are any other questions. I 
just might make a statement that my opinion of standardized tests 
in general that we do find that in urban areas this new way of test- 
ing has been introduced only more recently in urban areas than 
what we're able to ascertain that for many years standardized tests 
were taught as far as pre-K right on up how to take the tests and 
therefore the natural results are that those who have been tiained 
to take those types of tests invariably would do much better by 
virtue of their preparation to do that. 

I question v/here education begins and proficient test taking 
leaves off and there is, it seems to me, you know, in the environ- 
ment of teaching, where you develop concepts and so forth bv just 
practicing standardized testing. It just appears to me that there's 
an absence of education. 

Of course, there has to be a way to test wiiat has been taught, 
but Tve Seen somewhat concerned through the years of testing 
since much of it, as we all know, tends to be culturally biased. I 
just wonder how you might truly be able to test a really intelli- 
gence quotient of a person who has not been exposed to the bias 
that these test take by virtue of the manner in which they are 
written or prepared. 

So I certainly do not put too much stock in the testing of intelli- 
gence, ability to learn in the results of standardized tests. I've seen 
these types of tests exclude minorities through the years whether it 
was for employment— at one point in a very large company, I hired 
a person who through a summer program as a teenager who took 
the employment test and failed the test for normal entrance into 
employment with this extremely large form therefore the individ- 
ual would have been unable to work in that company. 

But we, through a back door method, I guess, I v/as able to con- 
tinue this person on from a summer program and it was not only 



ERIC 



4 , / 



70 



that this person became proficient— now this is a person who would 
have been excluded from a very simply and basic test at that time. 

This person not only did well but went on to become the supervi- 
sor, went on to open a department at a new regional home office 
eight or nine years later. The interesting thing that this individual 
who is still currently maybe in her middle thirties, early forties 
perhaps, is still moving up the ladder. , , i. i 

That company to this day doesn't know that she s the one that 
failed the test. I might even at one point see if I can find her again 
and maybe discuss some of these situations with her as it relates to 
the fact that she would have been unable to work for that corpora- 
tion based on that test. 

Therefore, that test had no relevance or ability to perform and 
achieve. So I, as I indicated, I missed the testimony, therefore I will 
not ask any specific questions, I just thought I might share those 
feelings with them. 

Mr Sawyer. Mr. Chairman, you had a question. 

Chairman Hawkins. May I ask Mr. Faldet a question because I 
was reading his prepared statement. On the bottom of page 6 and 
the top of page 7, in effect I was trying to see how the actual test is 
constructed. It is my understanding that what happens on the 
standardized norm-referenced test is that it*s designed in such a 
way that the National bell curve will result in 50 percent in effect 
of those taking the test will pass and 50 percent will fail. 

Dr. Faldet. No. No, Mr. Chairman. 

Chairman Hawkins. Would you then correct my understanding. 

Dr. Faldet. Let me correct that impression if it is there. 

Chairman Hawkins. Well it's also in the National Association ot 
Secondary School Principals book that I see on testing. They say 
that also 

Dr. Faldet. The only thing I would want to correct is the pass/ 
fail. All of us, no matter on what traits we might be measures, 
someone is going to be the one that scores the highest and someone 
is going to score the lowest. That does not imply that even that 
person scoring the lowest has failed. It just describes. That is the 
measurement concept. The concept of passing or failing or good or 
bad comes only after someone puts a value on a particular score. 

For example, all of us would like to see every student in the 
United States scoring above the 50th percentile on every test. Un- 
fortunately, by definition, that will never happen. As the track gets 
faster, the percentiles change and we say alright there is a new av- 
erage. There is a new median. It's not a pass/fail. That s an evalua- 

tiOn. , r , rA 

Chairman Hawkins. Well, let's not use that. Let s say 50 percent 
will score above and 50 percent will score below. Well first ot all, 
you conduct field tests as I understand it. Then you use the test 
scores and you eliminate those that everyone got correct and every- 
one got incorrect. You select out of that number of questions tho^ 
that are not all together one extreme or the other and then accord- 
ing to the bell curve, 50 percent then are expected or graded as 
above that norm and 50 percent of them are below the norm. 

Now there's no assurance that any further interpretation is 
going to be put on that test. That is you indicated and indicated 
correctly that a lot depends on how the test is interpreted and cer- 



71 



tainly that's true. But is it not true also that, in effect, we already 
know approximately who's going to be above and who s going to be 
below that average. We can pretty well predict that below the aver- 
age, there will be those students who because of language difficul- 
ties or cultural differences or varying adverse economic conditions 
are going to be in that below the average number. 

We also know that the children from the more affluent families 
with parents where they learn answers although they haven t 
taken the test but they learn the answers from their parents, from 
their home environment. We know that. We know pretty well 
that's how the standardized test is going to come out. 

You say there should be in-service training for the purpose ot 
correct interpretations and that's right. But we also know that in- 
service training doesn't take place ordinarily, that those kids who 
are termed, in elfect, low achievers are going to be stigmatized ob- 
viously unless it's accompanied by some other measurement, they 
are going to be classified and rated and forever be subject to that 
low achievement expectation. ^ t. * 

That's a normal situation. It's not your fault. I m not accusing 
you of anything, but isn't that in reality precisely what takes place. 

Dr Faldet. That is potentially th. fate of someone who scores at 
the tenth— fifteenth percentile— the lower scoring student. If you 
were going to predict where that student will be the next year on 
the appropriate test the following year, you would predict that 
that's where they would be then if there is no intervention, and all 
I'm suggesting and what our interpretative materials suggest is 
that if you have this information and don't do somethmg about it, 
then you might as well not have the information because indeed 
you could predict that score as well from the area the student 
comea from, the socioeconomic class and so forth. 

The information is provided not to confirm that indeed low scor- 
ing students probably come from more deprived neighborhoods, et 
cetera, but to identify and confirm those areas in which that ^u- 
dents needs some special instructional assistance to, in eitect, beat 
the prediction. That's why we do screening tests in medicine. 

It's not to confirm that yes, you have high blood pressure, too 
long Charlie, you're dead. But to take actions appropriate to reme- 
diate, to confirm certainly further diagnostic tests, but to make a 
difference. That's where I think efforts that might be suggested 
through guidelines say look we want to know what test you re 
going to give but we also want your strategy and your ideas and 
your commitment to do something about it when the scores come 
back -whether it's a local test or a nationally prepared one that 

indeed has some other potent' .1 values. 

Chairman Hawkins. I'm not accusing test developers and I m not 
accusing the state. It may sound that way, but isn t the current 
education policy driven by test scores and not by intervention and 
not in-service training and not by teacher development. For an ex- 
ample, when the Secretary of Education calls the schools terrible, 
as he recently did, in effect he's ignoring what can be done to cor- 
rect the very situation that the test scores seem to generate not be- 
cause they're wrong but because we don't follow up. 



ERIC 



72 



I think that's v hat we're trying to do is see how we can best use 
test scores in the proper way and not as we do now. But we never 
get around to finding the money for intervention for example. 

And so well, I thSik we agree on at least the implications even 
though we are very slow in getting the solutions. Thank you. 

Mr. Sawyer. Mr. Chairman, you told me when you asked me to 
take the Chair that I had to get you guys out of here by noon. 

Chairman Hawkins. Oh, Fm sorry. 

Mr. Sawyer. I just want to take the prerogative that you ve 
given me and the Chair to say thank you for an extraordinary 
hearing — one of remarkable importance and one who's topic I hope 
we can visit again. 

Chairman Hawkins. Thank you for a very remarkable group of 
witnesses. Thank you. 

Mr. Sawyers. If there is no more business to come before us, v/e 
stand adjourned. 

[Whereupon, the subcommittee was adjourned.] 

[Additional material submitted for the record follows.] 



FRir 



73 



Friends for Education, Inc. 

600 Girard Boulevard N E 
Atouquarqua. New Maxtoo 87106 
(50u) 260-1745 

Working For Aocounttbtlity C*nn«n 
In Pubic EdocHion Prttid«nt 



The Honorable Augustus Hawkins 
Chairaan 

Com it tee on Education and Labor 
U.S. House ot Representatives 

Raybum House Ottice Building 
Washington, 1> 205!5 June !6. 1990 



Dear Chaiman Hawkins: 

Thank you for asking to testify before the SubcoMittee on 
Eleaentary, Secondary, and Vocational Education. I regret that I was 
unable to coae to Washington to personally testify, but I believe our 
Washington representative, Hr. Walter Fai thorn, ably presented our 
organization's views to the Subcoanittee. As per your request, I hereby 
subaiit the following written testimony. 

My views on testing in Aaierican public schools are expressed in 
detail in The "Lake Wpcbegone" Report: How Public Educatora Cheat on 
gy^lriffrt l^qd Achievement Tests, a copy of which I enclose. In addition, 
I enclose copies of a few of the aany newspaper articles about the "Lake 
Woebegone** cheating scandal, as well as a videotape of recent NBC and 
CBS coverage of the scandal. 



Personal ETOriencea 

My views of curient testing practices in American schools are 
colored by three personal experiences. The first is ay experience 
treating adolescents pacients over the years, aostly for self-esteea 
probleas. As a general physician, 1 saw child after child, froa 
upstanding and caring faailies. daaaged by our school systeam. Tiae and 
tiae again, 1 saw functionally illiterate children aoved through the 
public schools like so aany cattle. These school's lack of atandarda 
stood in sharp contrast to the high "standardized" achieveaenr test 
scores the school adainistrators routinely released to parents and the 
press . 

As I becaae increasingly suspicious of the public school *a testing 
prograas, I started sending aany of ay patients for outside achieveaent 
testing by independent testing experts. I found aany of these children 
tested well below grade level on independent achieveaent testing but 
both they and their parents were being told they were achieving "above 
the national average" by school officials. Needless to say, aost of 
these children caae froa disadvantaged backgrounds. 



ERIC 



i / 



74 



The second event which colored my view of testing occurred vAien I 
queried the U.S. Depertaent of Education about the oo—ooly used 
stsxKUrdized schieveuent tests. I ■istskenlv sssuaed, Ulfffi mby 
A«iirir^.. th^t soae brsnch of the federal Kivern»ent stteapted to 
verify the sccnrscv nf tllf rnTirrlll irhigv^^ent^ tests our children 
tske in public school. After sU- they are the product of coMercial, 
for-profit corporationa that sell and tranaport goods and services 
across stste lines. 1 was shocked to lesm thst the U.S Depsrtaent of 
Education aakea no effort to verify the accuracy of theae teat a. Unlike 
teating in any other country in the world, the achievement teata given 
to American children and reported to Aaerican parenta are not regulated, 
verified, or overaeen by any agency, private or public. Instead the 
policy of the C.S. Department of Education aeesM to be: "Let the 
children bewre." 

A final incident convinced mc that a aubstantial nuaber of 
American public achoola are releaaing fslsified schievement data to 
parents, taxpayera, and the preaa. After becoaiing increasinaly 
troubled, I decided to telephone s asjor test publisher and present 
■yaelf as a superintendent of schools from s small southern Virginis 
sclkool district who wsa interested in buying one of their tests. I 
called and explslned that our board of education was conaidering 
changing teata. and the members we^e very interested in improving the 
district's test scores. 

Almost iamediately, I waa talking to a saleswoman who implied that 
our district* a acorea would be "above average" if we bought one of their 
teats r She further intimated that our acorea would go up every year, at 
leaat until we changed test questions. 

Bow could she know that our district would be sbove the national 
sverage? The diatrict whose name I uaed ia a poor rural southern 
Virginis district. How could she be sure our scores would go up every 
year? She couldn't know if our diatrict 'a schools were improving or 
not. 

I had been sware of rumora about cheating in * Sxils. Hany 
teacbera privately told me that acbool peraonnel atudenta' 
answer aheeta after the teat, gave atudenta mr m the si lotted time, 
uaed the exact teat questions to review for the ..est, or made copies of 
the test to give to their students. Hany teachera complained that 
adminiatratora forced them to teach items known to be on the test, 
clsiming they could not get a promotion without producing high test 
scores. 

It became clear why the aaleswoman could guarantee acores would go 
up every year aa long aa we didn't change teat questions. The schools 
and the publiahera they had under contract were jointly claiaing that 
acorea were iaproving becauae achoola were imprcving. The schoola, in 
cooperation with their contract pobliahera, were teaching the students 



ERIC 



75 



the answers betore the test wsa adslnlstered. and then the districts 
reused the ssae test questions year stter yesr. 

No legitiMte standardized test, such as NAEP or the College 
Board, allows school personnel to see the test question* -d'-sncc. Sm 
)^itlB4tc test uMsto cue saaw exact test year after yesr. In additioni 
legitiMte stsndardized teats only sUow 50 percent ot the students to 
test "above sversge." Publishers and local school authorities clained 
the scores on "Lake Wbeber^" teats were improving becsuse the schools 
were iaproving. However, .ne sctual piocess under way was increasingly 
efficient revelation to atudents, before their test, of the questions 
that would be on their test. 

I decided to survey all 50 ststes to see if any atatea were 
testing below the publisher's "national nora." Friends for Educstion 
had not yet obtained any outside funding so I, ay nurse, Isb technician, 
and X-rsy technician called and wrote letters to state education 
departments requesting test infonution. After obtaining results froa 
■ore than 3S00 school districts, we concluded thst 9SX of Aaerican 
school districts, snd all 50 states were clsiaing that their local 
schools were above the national average on caaaercisl st.hievemcnt tests. 
Our study showed that so«e of the poorest, aost desperste school 
districts in the nation sre sble to pacify the press, parents, and 
elected officials by testing "above the national average" on one of 
these shaa coMercial schieveaent tests. 



It is iaportant to note that the tests that give us the "Nation at 
Risk" aessage— the National Assessaent of Educational Progress, the 
College Entrance Exaainatioos, the International coapariaons of student 
achievement —sre not the tests Aaerican school officisls use to sssess 
local school achieveawnt. Instead, within the last tfcwnty years, 
Aaerican school board aeabers have becoae dependent on one of five 
coMercial schieveaent tests to aeasure local school progress: the 
CMlifomim Achievement Temt. the Staaforrf Achievement Teat, the 
Netropolitm Achievement Test, the Comprehensive Teat of Bmaic Skills, 
and the low Teat of Basic Skills. In the last 15 years, these five 
tests have becoae the principal local yatdsticks. the local internal 
report cards of Aaerican public education. 

Just as the Nstional Assessaent of Educational Progress (NAEP). 
the College Board, and the Araed Services Vocational Aptitude Battery 
are used by federsl officials to aeasure Aaerica's educational progress, 
coaaercial schieveaent tests sre used by locsl officisls, parents, and 
the press to aeasure local school's progress. However, coaaercisl 
schieveaent test publishers have not tsken any of the siaple security 
precautions with their product that NAEP, the College Board, or the 
Araed Services routinely takes with their teats. 



Tlie ^ffmtitn on 






76 



Co i t rcUl te«t ptibliftbcrs even sell test preparetioQ MterieU 
which cootsin review quest ions taken directly frca currently used 
coaMrcisl tests. For exMple. the CTB/HcGrs%f-Hill *s CAT Lesmint 
Hsterisls unethically preps students on s Cslifomia Achleveaent Test 
question by tellint students how to change s theraoaeter resdlnt ^ 10 
degrees. One of the quest iooa on the noat recent edition of the 
California Achievenent Test saka atudenta to indicate a theraoawter 
reading that ia 10 degreea higher than the one pictured. 

Current testing practicea victiaisea achool teachers aa well aa 
children. Teachera arouad the country have ca8|>lained bitterly to ne 
about the extent of unethical teating practicea in our achoola. Many 
teachera were concerned that if they di(ki*t cheat, they would look bad 
coa«>ared to the teachera who did. All the teachera coaplained that 
cheating ia encouraged by their achool adtainiatratora in order to nake 
the school's achieveaent scorea look good. 

T%«enty yeara ago c o— e rcial achieveaent teata were SHinly uaed for 
inatructional purpoaea. Teachera uaed then to detemioe which atudenta 
were behind and if the claaa needed aore work in one aubject than 
another. Claaa acorea, school scores, district, and atate scorea were 
either not conpiled or not nade public. The teata ««ere uaed to help 
children, not to evaluate educatora. 

However, that changed when atate legialaturea atarted inaiating on 
accountability. Alaoat overnight, the teata were aaked to serve an 
accountability purpoae inatead of Juat an inatructional one. They have 
aince becoae the principal local yardaticka of Aaerican educational 
progreaa. It seena unlikely that co— ercial achieveaeot teata vill ever 
•gain be solely inatructional aida. Therefore, pnbliahera need to 
Bodify the t«;ata to aerve their preaent function. 

The glowing preaa releaaea, gloaay atudent achieveaent brochurea. 
"good new" parent report foraa. and optlniatic official 
"accountability" reporta put out by Anerican achool officiala are 
teatlaony to the fact that public edncatora thenaelves now use 
connercisl achieveaent teata to neaaure achool quality. And, for the 
laat IS yeara, Aaerican educatora have found it eaaier to improve teat 
scores than to iaprove public achoola. 

State legialaturea and achool boarda need accurate aeaaurearata of 
local achieveaent. Local officiala can not operate blindly, they need 
to know what children know and when they know it. How can local 
off icialp refoia Aaericaa achoola when their principle yardaticks tell 
then they already have? 



I endorae the recoaaendationa of the Natiooal Coaaiaaion on 
Teating and Public Policy and suggest you estsblish s "Truth in Teating" 



ERIC 



77 



•geocy to ovenee the dcvelo p w en t, nonilat, Mrketlnti •dMlnlatratian. 
•ad report lot of atandardlsed achleveMnt testa. However, I believe 
such Ml et«ncy •bould llalt Itaelf to aloply protect lot the AMrlcan 
ooomaer ecaliwt fraudolent test lot, aoch teatlng that elloM all 
•choola to be "above the national average," or testa that are 
Mtailnlatered vlthoet basic security procedures. I do oot think that a 
"Truth in Testinc" Agency should atteapt to dictate testing policy to 
state dsciaioa Ml^^rs. That ia. the deciaioa to test, uhen to test» and 
tiist to do with the resulting scores should continue to be the state 'a 
declaicr^. The agency should only be charged vlth salcing aure that auch 
testini is honest. 

Second, I suegest that you direct the Federal Trade Coanission to 
investlgste cooBercisl test publishers. Our sttoroey feels conswrcisl 
test publishers sre presently violstlng current PTC regulstions. I 
include s copy of our sttomey's opinion oa the natter. 

Third, I suggest thst yen require The Oniced Ststes Oepartnent of 
Education to ISMdiately request that ccnaercial publishers of 
ataadardised schieveaent teata voluntarily ccaply irlth the following set 
of guidelines. These gnidelines are designed to aasure that the 
selection, use, end reporting of coaaercisl schieveaent tests by 
Aner lea's public schools will not sis represent schieveaent gsins» lesve 
fslse ispressions of relstive schieveaent, or otherwise deceive the 
Aaerican public. 

1. Publiahers of any group adtainiatered achieveaent teat ahall take 
ateps to ensure that only one^half of atudenta can teat above the 
"nationsl noia" on their teata. Specifically, publishers should 
only sell current snnual noraa derived froa a nationally 
representstive saaple of students that uae their test. This would 
require that publiahers sccept responsibility for their nora's 
sccurscy by coapillng s current snnual nota froa a nationally 
representstive saaple of students that use their test, and that they 
do thia annually. 

2. Publiahers should discourage educatora froa becoaing fasiliar 
with test qusstioos. Soae of the publisher's test procedure 
recoaaendatlona encourage teachers to becoae faailiar with test 
content in a Banner that invalidatea the inferences consuaers 
naturally aake about the over&^i doaaln of achieveaent. 

3. Teat publishri-s should instruct users and consuaers on the need 
for sdequate teat aecurity» and ahould clearly atate thoae security 
precsutions in their test sdainistrstion aanuala. Specifically, 
coaaercial teat publishers should sell tests with seala on thea, and 
with Instructions printed on the test that clearly forbid teachers 
froa reading the teat in advance of adainiatration. Publiahers 
ahould also recosend that educatora deliver testa to the school 
shortly before testing, that teata ahould be given to teachers on 





78 



the <tey of t«»tliit» wd chat oatalde test proctors be need whenever 
possible. 

h. Pobllshsrs should only sell sons tsbles thst sccorstely reflect 
UtaC perceotste of spsclsl edocstlon sod blllntiisl students thst sre 
csrrsatlx tested by ths public schools. 

5. Fublishsrs should kasp test content secure, sod not si low the 
qnsstions on currently us^ ilo— trcisl tests to be used es "revieii" 
questions in test pr«p«rstion aster iele. 

Thsnfc you for holding this hesriot. end for requesting my 
testieony. If the coeslttee is interested in investigscing the extent 
of chesting by Aeericsn school officiele, or the effect thst frsudulent 
testing progrsBs have on tescher*e eorsle, I would ettce|>t to eupply 
your etsff with neees of teachers willing to testify. 






79 




UNITED STATES DEPARTMENT OF EDUCATSON 

oi I K I (>t iiu Assisr \M sit Rt I 

11)1 ( \IK)N\1 HI SI \R( 11 XM> l\l»*Ku\l Mt SI 

JUL 9 1990 



Honorable Augustus F. Hawkins 
Houss of Rsprsssntativss 

Washington, DC 20515 

Dsar Mr. Hawkins: 

I rc^sntly r«c«iv«d • copy ot '/ritt«n testimony fron Nr. Halter 
E. Faitnom, Jr. prepared for ' ur hearing on testing, assessnent 
and evaluation held June 7, 19V0. 

The testiaony references a BttSting that was held at the U.S. 
Departaent of E^^ucation at the request of Nr. Faithoxn on 
June ^\ 1990. I believe that a portion of this statenent before 
your couittee does not accurately reflect what was said by staff 
of ny office at the aeeting. For that reason I want to correct 
the record pending before your Coanittee. 

The issu^ involves "allegations of cheating, fraud, and deceit" 
in adainistration of standardized tescs. According to the 
testiaony, Nr. Faithorn reported that Departaent staff told hii; 

"that not the Congress, nor the States, nor the local 
school boards. . .want the U.S. Departaent of Education 
aessing around in matter* of this sort — telling them 
what they are doing wrong, how this State compares to 
that State, or this rchool district compares to that, 
etc." 

In fsct, this visy was not expressed at fhe n.«seting. Instead, my 
staff described the process by which a Federal agency would 
scquire s regulatory role in a matter such as administration of 
standsrdised tests and pointed out that the Departaent of 
Education had no such function. They also pointed out, in 
agreeaent with Nr. Faithorn, chat issues of norming anri test 
security ar* very iaportant to the Federal government. This is 
why we rerlicated Dr. cannelX's first study and have asked 
aeabers of the coaaittee in charge of the ^^fidl fiX lAiC Tcating 
Practice in gducation to consider issues of test security in the 
future. In addition, Departaent staff advised Nr. Faithorn that 
the U.S. Departaent EducJition was also actively working on 
other, snd oossibly larger, issues in testing ano assessment; 
strengthening the state-of-the-art in testing; ari iculatii.g the 
relationship between testing, instruction and curriculum; making 



O 1 1 



80 



Page 2— Honorabl* Augustus F. Hawkins 

tssts Bors "authentic" Masurss of what students a capable of 
psrfoning; and improving disssmination of information about 
•ffectivs practic98 in testing. 

Another point Mde by Mr. Faithorn was his concern "about a 
possible reduction in thn rigor with which test security will be 
practiced" in tl - National Assessnent of Educational Progress 
(NAEP). This .cter was not discussed at Mr. Faithorn 's aeeting 
with my staff and I an not aware of the reason this stateaent was 
Bade. However, it is incorrect and would be contrary to policies 
and practices under whxch the Departaftnt carries out the National 
Assessaent. We have made a special effort to incorporate 
procedures into NAEP thet aaintain test security. We have a 
strict itea release policy, we maintain the confidentiality of 
all students and schools that participate in the assessnent, and 
we Bonitor half of the schools in the Trial State Assessnent and 
no school knows it will be nonitorej until th% day of the 
assessnent. We have given sons consideration to th*. po-sibility 
of nonitoring fewer sites, although of course still unannounced. 
But any decision along these lines would follow a careful 
evaluation of actual experience in 1990. At present we are 
inclined not to nake such a reduction. 

Another procedure to assure rigor and test security is that all 
test booklets are wrapped in plastic and are not opened until the 
day of the assessnent and all oaterials are quickly collected and 

Turned to the NAEP contractor innediately after the testing is 
completed. In sun, the Department would not take any action that 
would reduce test security or reduce confidence in the validity 
of NAEP results. In fact, we are continually working on ways to 
improve them. 

You have also invited me to prepare a statement for the hearing 
record and I will do that separately within a few days. 




cc: Walter E. Faithorn, Jr. 





81 




U S DEPARTMENT OF EDUCATION 
OFFICE OF EOUCAriOiAl RESEARCH ANOtMPROVEMENT 



NATOfiAL CENTER FOR EDUCATION STATISTJCS 



JUL I 2 !9C0 



Honoraria Augustus F. Hawkins 
ChairBAn 

Couittaa on Education and Labor 
Houss of Rsprsssntativss 

Washington, DC 20515 

Dsar Nr. Chsiraan: 

I apprscists your invitation of Juns 6 to provids a statsasnt for 
ths Couittss rscord on ths subjsct of your hsaring dsaling with 
tasting, ssssssmsnt, and svslustion. Thsss srsss of sducation 
BsasurttBsnt ers of csntrsl iiqwrtancs to ths National Csntsr for 
Educatlor Statistics (NCES) bscsuss of ths growing intsrsst in 
asssssing studsnt psrforaancs and b«causs NCES usss tssts for a 
nuBbsr of its dsts collsction sctivitiss. Whils ws do not face 
tha challangas or nssds of schools snd districts in rslating 
institutional goals, curriculs, and instructional matsrials and 
mathods to teating, ws do draw on ths availsbls sxpsrtiss and ars 
influsncsd by ths saae dabatas about tasting in which schools, 
districts, Statas, researchara, policynaksrs, and the public are 
currently engeged. 

Let n% reapond ro your questions in turn: 

1, Vhmii are appropriate ■easuree to assess l«arnina in our 
schools ? 

Thsrs is Hingis tsst or tsst foraat that is appropriate 
for aeasuiing all Isarning in our schools. Ths asasurs of 
progress toward learning of a specific curriculua requires a 
criterion-referenced test (a tsst that meaeuree how such has 
been learned froa a well-aecmed doaam of conxienc ski i. Is) . 
This typo of tsst ia used in aost Stats snd local tasting 
prograas. Ksaauring progreee toward broadly defined 
object ivee and aaking coaparieons saong groups rsqulre a 
nora-raferenced teet \m taet that coaparee e studsnt 's 
skills to thoss of othsr studsnts) , such ss ths tsats 
provided by coaaerciel testing prograas and ths college 
entrance exaainetions. 

Currsntly available criterion-referenced and nora-ref erenced 
teeta are not appropriate aa exclusive indicatore of the 
Cftaparative progrees of different educational eystems. For 
exaaple, the reports by John Cannall that were described to 



WASMt»,070N OC 2020»- 



ERIC 



82 



Page 2 - Honorable Augustus F. Hawkins 

your Conmittee clearly indicate that nom-ref erenced tests 
can give misleading results when used for this purpose, 
because adainistrative practices are not unifonn. The 
National Assessment of Educational Progress (NA£P) is the 
only currently available assessaent that is specifically 
designed to measure trends in the progress of education 
systems and sake comparisons among states. 

2 How can testing and assessment programs at the Federal. 
State, and local levels be integrated and interrelated? 

It IS important to continus separate testing programs at the 
national, State, and local levels because each type of 
program is specifically dssignsd to servs a different 
function, as noted above. Local testing programs should 
evaluate student Isarning of ths specific local curriculum, 
and the*' are u&sd to diagnoss individual studsnt strengths 
and wsaknjsses, and to assist classroom teachers with 
instruction. Local tssting programs c9 ^ot be aggregated to 
evaluate p jgress across Statss or the .ation because they 
cover different content in different grades, at different 
times and under varying procedures. NAEP, on the other 
hand, can be used to svaluate progress of ths States and the 
Nation against a standard set by consensus, but is not 
appropriate for evaluation of school districts, schools, or 
students because its contsnt is not specifically aligned 
with curricula studied in each district and classrocm. 

However, various assessment programs can bs articulated or 
connected through different ''linking'* mechanisms. In the 
NAEP trial State Assessment, NCES is encouraging and 
providing technical assistance that will make it possible 
for Statss to link thsir State testing programs to NAEP. 
once this is done, a Stats can provide a "NAEP equivalent" 
score to all of its students who were tested in ths sane 
subject in the same grade (s.g., for all math students in 
the eighth grade) in the stats. In this way, a student's, 
school's, or district's score would be more valuable because 
it could be compared with the benchmark national scale 
available in NAEP. 

3 How can we minimize the possible adverse effects of testing? 

The isBus of adverse impact is uost relevant to local and 
Stats testing programs that ars used to make decisions (such 
as promotion, graduation, and program placement) about 
individual students, and to such testing programs as the 
fcholastic Aptitude Test (SAT) . Such tests, if they are 
biased or otherwise unfair, may deny students educational 
and employment opportunities. The assessments administered 
by NCES are not used to make decisions about individual 
students, but instead are used to Inform policymakers and 



ERIC 



83 



Page 3 • Honorable Augustus F. Hawkins 

educators about the progress of education in the Nation, 
regions of the country, States, or various groups of 
students such as minority populations. To maximize the 
reliability of assessments used to understand the relative 
achievement levels of these groups, NCES undertakes vigorous 
examinations for bias and other forms unreliability, prior 
to test administration, for every item in assessments NCES 
conducts. 

The best long term approach to minimize the potential for 
adverse impact is to encourage the testing profession to 
continue developing professional standards. The two m^jor 
documents that deal with this issue are the Joint Technical 
standards and the qode of Fmir- Testing Practice in Education 
(both sponsored by the American Psychological Association, 
the American Educational Research Association, the National 
Council of Measurement in Education, and other national 
groups) . These documents have been widely endorsed by 
testing programs throughout the country, and represent the 
standards to which the profession has a^jreed to make itself 
accountable. This approach to building and maintaining 
standards should be refined and continued. The Centex 
requires its contractors to follow the guidelines in these 
documents for tests conducted for NOES'. 

4 . How car, comprehensive assessment systems be developed at the 
natinnal. State, and local leval« that will focus upon 
student progress and school iaprovinent far all children'^ 

As I mentioned above in reference to the point about 
integration and interrelation of Federal, State, and local 
testing and assessment programs, assessment has a unique 
role to play at each level. The National Assessment program 
is currently a method for obtaining information on how 
children in American schools at grades four, eight, and 
twelve perform in selected subject areas. It is intended to 
serve as an indicator of what Americ.n students as a whole 
know and can do. The new Trial State Assessment collects 
consistent and uniform information ibout student performance 
across all States. This program will provide a way to 
understand the relative standing of states in tervs of 
student achievement in giv«n subjects, such as math and 
reading, and the relative strengths and weaknesses within 
these broad subject areas, such as the relative performance 
in algebra and reasoning skills. It will not provide 
information at the district, school, or student level, nor 
provide information about what changes ought to be made. 

State assessment programs, in contrast, focus specifically 
on state level curricula and allow States to evaluate how 
well their districts and schools are doing in achieving the 
goals of those curricula. District, school, and individual 



ERIC 



84 



Pag* 4 - Honorable Augustus r. Hawkins 

student tssting prograM, in conjunction with stats 
aassssBsnts, allow local super intsndsnts, curriculum 
specialists, principals and taachsrs to svaluats ths 
performancs of individual studsnts anU to diagnose their 
specific strsngths and wealcnssses at a detailed level. For 
sxaaple, such testing prograM aay provide inforaation on 
which subtopics within the local curriculu» each student has 
learned (e.g., in reading coapreheneion, whether a student 
can identify specific information, identify ths main idea, 
or apply that inforaation to a new problev) . 

These separate co«ponents for» a whole assessMsnt eystem, 
when each is iapleaented at ite appropriate level— national , 
State, or local. Such a systs* would provide the 
inforaation for educators and parents to Icnow about and gain 
insight into student progrees and school iaprovenent for all 
children. Each level of assessMent providee epecific and 
unique types of information to achieve thie objective. 



5. 



Whftt i» the aporoDriata Feder al role for iaprnving filing 
and assessment at the national. st»i-m . and local levela ? 

In his separate reply to your June 6 lette*:, Assistant 
Secretary Christopher t. Croee addresses the overall issue 
of the Federal role in testing and asssssment. My comments 
deal with the specific activities of the National Center for 
Education statistics. 

The Center, aa I noted above, administers many tests In 
connection with its mandate from Congress to gather and 
report data on the condition and progress of education. In 
addition to the National Assessment, tssts are used in our 
longitudinal studies, intsmational achievement comparisons, 
and adult literacy assessments, other areas, such as schoui 
readineee and college level achievement, could be added in 
future data collections. 

NCES makes use of the strongest and most diverse advice it 
can find in developing these tests, but we are now planning 
to search more aggressively for approaches to testing that 
will make our data more reliable and valid in the future. 
We are exploring the possibility of supporting research anr^ 
developmental work needed to improve the state of the art 
for large scale national and international assessments. 
Some of the areas include: incorporating recert findings 
in cognitive psychology into educational assessment 
instruments, using computer technology to assess ths 
learning strategies of students, improving psychometric 
procedures for "authentic" performance test items, and 
improving methods of measuring "opportunity to Isarn" in 
international assessments. 



ERIC 



85 



Pag* 9 - Honorabxs M>gu«tus P. Hawkins 

Ivan though our purpoaa in thaaa activitiaa ia to aaka it 
poaaibla for MdS to raport aora raliabla, valid, and 
ooavlata atatiatica on aducation, thia naw knowladga would 
ba of diract uaa to stataa and otbar aponsors of larga 
taating prograsa aa vail. Thus, IICB8 would *bla to 
provida tachnical aaaiatanca to othar aducation data 
collactora. In addition, NCI8 will ba aupplaMiting tha 
activitiaa of tha National Coqpcirativa Education Statiatica 
Syatai. "o that Stataa and local diatricta can ineraaaa and 
i^r^< chair af forts to Monitor prograaa toward tha 
^ «nt'a and tha Nation*a Govamora' national aducation 
goala. Thia activity will laad to ivprovad data and 
indicators that would 1m tailorad to local conditiona and 
naads. 

Thank you for providing thia opportunity to coioMnt on thaaa 
important taating, aaaaasaant, and avaluation iaauaa you hava 
addraaaad in your Coanittaa. If I or sanO^ara of tha NCE8 ataff 
can provida furthar aaaiatanca, plaasa lat ua know. 




Eaarson J. Elliott 
Acting Conissionsr 




86 



EDUCATtoNAi Testing Service 
Princeton. New Jersey 

08341 



Gkegory R Anrig 

pkesident July 19, 1990 



The Honorable Augustus F. Hawkins 
Chairman 

CoMRlttee on Education and Labor 
U. S. House of Representatives 
B>346C Rayburn House Office Building 
Washington, DC 20515 

Dear Representative Han^lns: 

I want to take this tpoort unity to te11 you how much I admire the 
outstanding leadership you hav? provided In the House of Representatives 
over the past 28 years to further the cause of equal employment 
opportunity and quality education. It has been a pleasure working with 
you and your staff on *' Important educational Issues during this 
period of time. 

I understand that the record Is st111 open from the June 7th 
hearing on educational testing and assessment. I would like to respond 
to your request for coMients on several Important Issues which, not 
surprisingly, are priority concerns for us at ETS. 

The Issue of appropriate measures to assess learning Is of central 
concern to us as It Is to you. There are many different approaches to 
assessing Important aspects of student learning. Nu1t1o1e choice tests 
are most widely used for assessment of Earning In situations In which 
large-scale and low-cost assessment Is needed. However, even projects 
the scale of the National Assessment of Educational Progress (NAEP) 
Include non -multiple- choice portions, with 30% of Its recent assessments 
calling for performance responses by students. 

Today there Is a great concern for so-called authentic assessment 
or performance testing. Such asse^^ment may be as Important for the 
impact It has on the educational system as for the types of learning It 
assesses. At ETS, we have made some exciting advances In large-scale 
performance assessment ranging from the National Assessment to scoring 
hundreds of thousands of student !ssays each year for the College 
Board's Advanced Placement Prograi to performance testing In licensing 
programs of several types. Such assessment Is more expensive than 
multiple choice testing as we11 as requiring more student time and 
judgmental scoring, but It Is practically feasible. 

In addition, we are very excited about the r'^le issessment can play 
in improving learning at the classroom level. We have several projects 
In which assessment Is designed specifically for the purpose of 
improving student learning. In an experimental middle school science 



ERLC 



87 



The Honorable Atqustus F. Hawkins 
July 19, 1990 
Page 2 



program, the teacher uses complex Integrative tasks as both Instruction 
and assessment. In Arts PROPEL, a project In collaboration with the 
■Pittsburgh Public Schools and Project Zero at llarvarci, teachers and 
students use the assessment of student portfolios of art and writing as 
part of the learning and Instruction process. In these and other 
activities, we are putting assessment to use directly for student 
learning. The resulting assessment Is quite different In nature than 
assessment designed for Judgments of student learning Independent of the 
classroom learning context. 

These differences In assessment we ate seeing at different levels 
and for different purposes relate directly to the Issue you raised of 
how to Integrate assessment programs at var ous levels. Clearly, there 
needs to be more connection between what Is good for the classroom and 
what Is used for large-scale evaluation and accountability. We suspect 
that the route of NAEP with a combination of economical and efficient 
measures supplemented by a substantial portion of performance- type 
measures Is a useful approach for accountability testing at the Federal 
and state levels. In the long term, however, as we learn more about the 
complex forms of assessment that now seem feasible only at the classroom 
level. It may be possible to accomplish a more thorough Integration. 

In a recent article, I described two other Important 2ssess[nent 
concerns, excessive testing In this country and efforts to Insure 
fairness. I am enclosing that article as well as NAEP background 
information for your reference. 

I understand that the Committee may hold future hearings on the 
subject of educational testing. I would like to offer whatever 
information or assistance ETS can provide to help you In your 
examination dnd deliberations on me important issues you are 
addressing. Thank you for this opportunity to contribute to the record 
. F the recent hearing. 

I wish you good health ano much happiness in your retirement. We 

* will miss you behind the center seat at hearings and remain grateful for 

your tireless efforts on behalf of equal employment and educational 
opportunity in America. 



Attachments: 
The NAEP Guide 

Brochure on Innovations in NAEP 

Standardized Testing - Now and in the Future , by Gregory R. Anng - 
Article from the Alumni Review of the Harvard Graduate School of 
Education. Spring 1990 

First twc -nclosures have b' »-n maintairK^ci 
III Subcommi t to*, files. 




n - 
«>1 




88 



STANOARDIZEO TESTING - NOW AND IN THE FUTURE 

6r«gory R. Anrig 
Prtsldtnt 
Educational Testing Service 
January iggo 



What an extraordinary tiM for standardized testing in Anerlcan 
education. In Septeaber, t^o President of the United States and the nation's 
governors Met In a landMrk "education sumlt" and Jointly call for national 
perforunce goals for education and a Mans to Masure progress towards these 
goals. In the saM Mnth. the Annual Gallup Poll on Public Attitudes Toward 
the Public Schools Is released. Of those polled, 70 percent favor national 
achlevenent standards and goals, and 77 percent favor the use of standardized 
national testing prograas to Masure the academic achleveMnt of students. 

But this Is Just the tip of an Icebergl The 1960s have seen an 
explosion of standardized testing In education. Forty- four states now require 
soM for* of ■InlHUB coapetency tests, 35 of then requiring the use 
of state-developed or state- selected tests with state-prescribed perfonunce 
standards. Twenty-one states have testing requlreMnts for high school 
graduation. Where only a handful of states had testing requlreMnts for the 
Initial certification of teachers In 1980, this year 45 states have such 
testing requlreMnts. And the National Board for Professional Teaching 
Standards Is In the process of developing new assessMnts to recognize 
advanced teaching ability ^f experienced teachers. 

SoM Nrsonal Perspectives on Standardized Tests 

I was a consuMr of tests before coaing to ETC In igsi. I had a healthy 
skeptlclsa of standardized tests as a teacher, principal, su e*1ntendent, and 
state education comlssloner. It My surprise you that, aft ** eight years ai 
ETS President, I still have a healthy skeptlclsa of standard. jed tests. I an 
an educator, first and foreMst, and I Judge tests and other Inforutlon on 
the basis of how auch they help learning and the laproveMnt of education. 

Standardised tests do provide educationally useful Information when 
properly used, properly Interpreted, and used In conjunction with other 
Infonutlon before Mking decisions. Jh^y provide a useful "check and 
balance* on other Infonutlon precisely because they ARE standardized In 
content and adalnl strati on. For those who fault this standardization, 
consider the alternatives! I reaeaber well the fatigue, stress, and 
uncertainty that ^iccoapanled the hoaeMde tests I developed and graded at 
night as a te^c'ier of Junior high school socU^ studies. 

It currently Is de rigueur to criticize standardized tests In general 
and the poor old multiple choice question In particular (a format, by the way, 
that reliably Masures auch Mre than Its critics say and at a lower cost to 
the taxpayer or parent). Thanks to C-SPAN, I observed the testimony of 
educators before the National Governors' Association Task Force on Education. 
One after another decried the use of standardized *.ests to Judge the results 
of education. 



Prepared for the Spring 1990 edition of the Harvard Graduate School of 
Education Aluani Bulletin . 



ERLC 



89 



-2- 



I Strongly btlltvf this Is an 111*adv1std position for educators. The 
public and their elected offlclels want to know ^at students know and are 
able to do. They have a right to know this and educators had better find a way 
to be responsive. Me need to reaind ourselves that the education refona 
■oveMnt of the 1980s got started because of public concern that children were 
not learning enough In schools or even as well as they used to. This concern 
was justified then, and It still Is. 

I believe that som standerd1z»^ and econoalcally feasible way of 
assessing what Is learned by students will be required of educators as an 
outcoM of the historic new coMiltMnt to national performance goals. Once 
this Is accepted, then we cen focus reellstlcally on the cost, tiM and 
content trade-off Issues related to such standardized assessMnts. 

Three Key Issues of Standardized Testing 

Although I applaud the new coMltMnt to goals, I aa concerned that It 
nay lead to an unnecessary proliferation of testing In Aaerlcan education. I 
spend Buch of my tiM outside of ETS telling people that too Mich tiMe and 
aoney are being spent now on accounteblllty testing. Ue test too little too 
mich. It Is like pulling up a carrot to see If It Is growing. Can they read? 
Can they readT Can they read? Me can get a good answer to that question 
without testing every child several tlaes every year. 

The National Assessoent of Educational Progress (HktP) has deaonstrated 
Methodologies that can avoid the overuse of accountability testing by states. 
HktP assesses saaples of knowledge and saaples of studen.^ accurately and 
reliably. ETS has been proud to administer NAEP since 1983 and I believe It 
Is becoaing a creditable "report caro' for Aiwr lea's schools. One sign of 
that credibility Is the fact that 37 s>nes have signed up for the new state 
assessaents authorized by Congress In U>8. Ny hope Is that this new resource 
will help states to reduce the tiM and tk ney already being devoted to state 
accountability testing. 

In addition to urging people not to ti« so luch, I try to counsel thea 
about the Importance of keeping tests In persp. 'i.e and using thea properly 
only for the purposes for which they ar designs The SAT, for Instance, is a 
college adalsslons test and never was designed to k ^sure the overall quality 
of Aaerlcan schools. Yet Secreterles of Education ans the aedia continue to 
use It laproperly for this purpose. The National CoUiniate Athletic 
Association decided to use SAT end ACT scores - Ipproperly, In ay Judgment - 
as eligibility criteria for freshaan athletic:. Arkansas and Texas sought to 
use scores froa the NTE to deteralne whether In-service teachers could 
continue to teach. In each of these cases, ETS has publicly opposed such 
laproper test use (even refusing NTE services to Arkansas and Texas;) and has 
offered oro bono assistance to develop proper alternatives. 
In addition to these efforts to proaote proper test use, ETS last year joined 
with five other aajcr te^t publishers and publicly adopted a Code of Fair 
Testing Practices in Education. \ believe that testing organizations like ETS 
have a public responsibility not only to develop the best tests possible but 
also to be strong advocates for the proper use of these tests. 



n 



90 



-3- 



For M, th« nost troubling Issut rtgarding standard Iztd tf sting In 
AMrlcan tducatlon Is ttst bias. Ttsts art Mdt by hUMn bflngs k.d thtrtfort 
ctrtalnly can bt blastd. Organizations that dtvtlop ttsts havt a fL^antntal 
rtsponslblHty to guard against ttst bias. I w proud to say that probably no 
organization works hardtr to assurt falmtss In ttsting t^.• — ; ETS. Evtry 
qutstlon on tvtry for* of tvtry ttst that ETS dtvtlops Mst 90 through a 
■andatory stnsltlvlty rtvltM. Sptc1a11y tralntd staff starch for any 
Indication of bias, using structurod guldtllnts and procodurts. CoMltttes of 
txttrnal txp«rts In tach d1sc1p11nt scrutlnlzt ttst IttMS and ptrfonunct 
statistics. Inttrnal and txttrnal audit ttaas annually rtvltM adhtrtnct of 
tach ttsting progrw to ns's Standards fo r OuiHtv and Falrntss. ns and Its 
clltnts rtgu1ar1y conduct rtstarch on ttst bias and publish tht findings and 
data for scrutiny by Indtptndtnt rtstarchtrs. Thost Mho ust ETS-dtvtloptd 
ttsts art givtn guldtllnts and training on thtir proptr ust and 
Inttrprttatlon. In addition, a now statistical proctdurt was Introductd In 
1987 and now Is applltd to tvtry ETS-dtvtloptd ttst. Ca11td Difftrtntlal Item 
Functioning (OIF), It provldts a Mans to analyzt tht ptrforwanct of students 
of llkt ability on tach ttst qutstlon, bast^ on tht studtnt's ract, stx, and 
tthnlclty. bafort tht qutstlon Is ustd for scoring. A Major stop ahtad In 
guarding against ttst bias, othtr ttsting organizations art following ETS's 
load and art using DIF for thtir ttsts as wa11. 

It Is tsstntlal that thtrt bt continuing scrutiny, dtbate, rtsearch and 
critical analysis rtgarding ttst bias. Thost who dtvtlop and ust ttsts or are 
afftcttd by then should bt part of this ongoing process. And what Is Itarned 
should bt ustd to changt ttst dtvtlopMnt practlcts. I aa troubltd, howtvtr, 
by tht trtnd of soaa critics of ttsting and som of tht nedla to define 'bias' 
slaply as leaning any differtnct In ttst rtsuUs by race, stx, or tthnlclty. 

Unequal educational opportunity regrettably Is st111 a reality In 
Anerlcan education. It Is essential that the public spotlight continue to 
focus on these untqual opportunities until they are corrected. Tests are an 
Invaluable resource for deaonstratl ng tht profound tf facts of such 
1ntqua11t1ts. In recent years, a niabar of nationally standardized tests have 
btgun to rtport laprovtd acadesic ptrfonMnce of Minorities and woMen as their 
educational opportunities have iMproved. It Is shortsighted advocacy to ca11 
for MoratorlUMs on tht best vehicle for proMOtIng public action against 
educational Inequality. 

There Is a Moral and educational iMperatlve to guard against bias In 
standardized tests. ETS and I fu11y respect and accept our responsibility for 
this iMporatlve. But there also Is a Moral and educational iMperatlve to 
detenilne fairly and report dearly any differences In acadeslc achievement 
that exist aMong students, regardless of race, sex. or ethnicity. To call 
this bias Is a serloys Mistake. 



ERLC 



91 



-4- 



TlM Future: Nm Kinds of Standirdiztd Ttsts for Nm Kinds of Purposes 

I cut to ns btciust I btlltvtd It hid a unlqut capacity to help public 
education shase dearar and higher expectations for learning and to create a 
new generation of standardized assessMnts to usefully aeasure this learning. 
In 1987, the CTS loard of Trustees approved a five-year plan to achieve these 
•splratlons and cce»1tted a Mjor share of ns's financial resources to fund 
the effort. Me are ildwty In this undertaking ^nJ already are seeing what 
this new generation of educational assessments can be. 

SoM of these assessments will be performance based. HS and Harvard 
Professor Howard Gardner are working with teachers In the Pittsburgh Public 
Schools on portfolio assessment of student work In art, music, and creative 
writing. Here teachers are being trained to assess student work products at 
the draft stage In order to guide students to the next level of 
accomplishment. In another field, HS researchers are develop ng a computer- 
based science program for middle school students. Students will solve 
problems and conduct experiments, receiving continuous feedback on how they 
are progressing. A successor to the MTE Is under development that will 
Involve three stages for teacher licensure. The third stage will be services 
to promote state policies for systematic assessment of actual teaching 
performance In the classroom as one part of Initial licensing requirements for 
beginning teachers. 

A second characteristic of thess new assessmr its will be that they 
Increasingly will be Instructionally-based. Host -urrent standardized tests 
are not very useful for the classroom x^acher. Some of the new assessments 
win be designed specifically for the teacher. A new publication called 
AL6EBRID6E will be released In 1990-91. It Is aimed at Introducing middle 
school students to algebra. Field tested with teachers and students In six 
urban school districts. It provides assessment Information to students and 
teachers as they tackle basic concepts essential to an understanding of 
algebra. The purpose of ALfiEBRIDGE Is to encouraro more urban students to 
elect algebra In ninth grade as part of a concerted effort to promote their 
iwcess to college. To Improve critical thinking skills, a computer-based 
program Is beinc developed In several New Jersey and Massachusetts schools as 
a part of middle school language arts rrograms. Again, assessment will De 
aimed at giving lawdlate feedback to students and teachers. 

A third characteristic of some of these new assessments will be the use 
of technology and new forms of adaptive measurement. In a project fcr the 
National Council of Architectural Registration Boards, computer-based 
certification examinations will Involve actually doing design Projects and 
calling upon the standard references found In an architect s office. The 6RE 
Board has just launched a research and development project to computerize the 
Graduate Record Examlnati .ns. The computer will simulate actual tasks that 
graduate students regularly are called upon to do, such as reference searches, 
and win automatically move students to tasks at higher or lower levels of 
difficulty depending on their performance. HS also 1. developing new adult 
literacy assessments that are designed to aid employers and job tra ning 
centers In raising literacy skills employees need for the changing workplace. 



ERLC 



92 



-5- 



Tbtst art vtry tflfftrtnt kinds of ustssMnts from tht currtnt 
standtrtflztd ttsts mlUMt to AMrlcin tducttlon. As can bt sMn, thtir 
purpott Is not accounttblllty. Thtir prinry purpost will bt to tfrtw upon 
ttfvtncts In ttdmology, C09n1t1vt scltnct, and msurtMnt scltncts to provldt 
InfoTMtlon that Is ustful to Itamtrs and ttachtrs. 

. ^ thrtshold of draaatlc changts In standardlztd tducatlonal 

tasting. Thtst diangts art not llalttd to ETS's tfforts. Thty art oolng on 
tlstMhtrt u nail. At ETS, tlit focus will bt on now asstssaants that proMtt 
tht l i yrovatn t of Itamlna and of tducatlonal opportunltlts. Thtst chanots 
trt not drttM. Thty art Inltlatlvts alrt^dy btgun that will yitid 
significant rtsults In tht lt90s. This Is Indttd an txtraordlnary tiM for 
standardlitd ttsting In AMrlcan tducatlon. 



ERLC 



93 

F^irTest 



National Center for Fair & Open Testing 



June 21. 1990 

Augustus F. Htwkins 
Chtinnan 

Committee on Education and Libor 
U S House of Representatives 
B-346C Raybum House Office Building 
Washington, D.C 20515 

Dear Chainnan Hawkins and Members of the Committee: 

Hie National Center for Fair & Open Tesong (FairTest) is pleased to respond to the 
Committee's invitation to offer testimony in writing on the subject of educational testing and 
assessment 

Before entering my discussion, let me summarize FaiiTest's two broad 
recommendations: 

• The federal government must stop mandating educationally harmful forms of testing 
and assessment 

- The federal government has a potentially valuable role to play m supporting district, 
sute and federal government development of educationally helpful methods of 
assessment 



FairTest is the nation's only organization solely dedicated to making testmg and 
assessment fair, open and educationally relevant. FauTest has found, however, that because 
of lack of accounubility by thr testing industry, conceptual flaws in the design of most testt, 
and the misuse and overuse of tests, much of the testing that is done today is educationally 
destructive ' 

Testing exerts its harmful effects in three basic ways. 

First, the most prominent role of testmg has been to exclude racial and ethmc 
minonties, women and the poor. Indeed, the ability of tests to sort people by these categories 
was a major reason for the popularity of early standardized tests. While there have been 
instances where testing has opened opportunities, since its early days testing has served 
pnmanly as a gatekeeper. 

Second, as standardized, criterion- and norm-referenced muldple-choice tests emerged 
as the most unportant part of school accountability programs in the 1980* s, they came to 
exert a powerful, often controllmg mfluence on curriculum ami instruction in the schools As 



342 Broadway. Cambndga, Masa. 02139 (617) 864-4810 FAX (617) 497-2224 



ERIC 



94 



many stinlies have indicated, they exert the most influence on programs and classrooms 
populated by students who score low on tests, because it is those programs which try hardest 
to increase test scores. These programs are dispr op ortionately filled with low-income and 
minonty-group children As a result of the focus on testing, these students read less, wnte 
less, do fewer projects, do not use their higher order thinking abihties in school, ulumately do 
not become proficient students, and frequently drop out. TesDng encourages, reinforces and 
jusnfies all these harmful trends 

Third, the very nature of muldple -choice testing presents incorrect ideas of how people 
learn While cognitive and developmental psychology have conclusively shown that humans 
learn through acdve engagement with the world, muldple-choice tesdng is rooted in outmoded 
behavionst psychology that views leammg as the passive accumulanon of isolated bits of 
infoimadon. Even in learning "basic skills," students use higher order thinking processes, but 
die tests artifiaally and incorrectly separate basic from higher order.^ As the tests have 
come to control cumculum, they have encouraged a completely incorrect approach to 
instrucdon in the effort to raise test scores in the jhort run. 

Taken togedier, these three points pamt a sad picture: too many students are tracked 
using tests and placed in "dummied-down" programs where they are not challenged or 
stimulated and fail to make adequate educanonal progress. These students are 
disproportionately low income and children of color The evidence leads to one essential 
conclusion: our naUon^s efforts to construct schools worthy of our children will fail so 
long as standardized, multiple-choke testing remains the coin of the educational realm. 

However, in our cridcism of tesung we must not forget two unportant objectives that 
testing promised > but failed - to meet: to provide assessment and evaluauon informauon that 
teachers and administraton could use to improve mstrucuon, and to provide information on 
student and program performance for accountabtUty purposes. Both these goals must be met, 
but they must be met m a manner that does not end up sabotaging the fundamental goal of 
unproving public education, as the tests have done. 

What then can be done and what is the federal role? FairTest makes die following 
factual observations and fmm them offers recommendations. 

In state after state across our country, departments of educauon are working to develop 
petfonnance-based assessments. This type of assessment asks students to work on real tasks, 
thereby direcdy demonstrating knowledge and capabilities, rather dian fill in bubbles on 
multiple-choice questions. This process not only provides valuable information about 
achievement, it also fosters mstniction diat encourages dunking, explorauon, reflection, 
cooperative leanung, and, dirough them, the acquisiuon of and ability to uvt vanous skills 
and factual mformation. 

Plans by states to develop and use performance-based assessments are expanding 
rapidly. At die June 1990 Education Commission of die States (ECS) Conference on 

2 



ERIC 



95 



Assessment, a number of states that have mandates to develop and use the new assessments 
agreed to form a consoraum The states will share resources in developing and analyzing 
portfolios, open-ended test items and other forms of performance-based evaluation. This 
emerging consortium, to be co-ordinated by ECS, is only one of several being developed 

Additionally, many states are actively engaged in transforming their state assessment 
systems Among these states are California, Connccttcut, Vermont, Arizona and Kentucky, 
many more are invesnganng how to begin this process. Also, many districts are acnvely 
engaged in efforts to transform their assessment systems as part of changing to school-based 
management and adoptmg new models of curriculum and instruction. 

Performance-based assessment is snU emerging, so much remains to be learned and 
many problems must be solved. Research and expenmenunon, however, indicate in outUne 
form what a performance-based system can look like. 

At the classroom level, the essential tool is the portfolio. Portfolios are not simply a 
place to dump ail a student's papen. Rather, they are tools for reflection and evaluation 
They enable teachen and students, as well as parents and administrators, to sec progress 
students make toward agreed-upon educational objectives. They faciliute diagnosis of 
strengths and weaknesses, indicate the student's individual work that should be done, and 
demonstrate the achievement. They also presume that something worthwhile is happcnmg m 
the classroom; to fill a portfolio with ditto sheets and answers to multiple-chdcc quesnons 
taken from basal readen is sunply a waste of time 

At the sute level, there are two essential assessment tools. One is evaluadon of 
portfolio work. Vermont, for example, will *ook tt a sample of portfolios in every school m 
the state in grades four and eight. This will enable the state to repon on student achievement, 
note progress and problems, and make recommendations to both schools and individual 
teachers Because teachers will be trained as portfolio evaluators, a great deal of staff 
development in new forms of instruction and evaluation can take place. It is important to 
note that portfolios can be assessed in ways that provide aggregatabie, quantitative data 

The second essential tool for states, and even distncis, is the performancc-ba.wl test 
Such tests are easiest to conceptualize m the arts' one assesses a student's ability to play an 
instrument by hstening to a recital. Both artistic and athletic competition, such as gymnastics, 
have a long history of rating performances with high levels of reh ability among the raten 

Performance-based tests can take a variety of forms. On the one end, they can be 
"best pieces" from portfolios. That is, an important student project, such as a piece of 
sciennfic research or historical investigation, can be assessed as a test These are tests that 
not only are not secret, but that must be open and serve instructional as well as evaluative 
purposes At the other end are tests in the more traditional sense, only with items that force 
students to solve ill-structured and open-ended problems in which they first have to decide 
what the problem is, then offer a soiuaon which they can explain and justify. As with 

3 



ERIC 



96 



oortfolios, performknc9-btsed tests provide a basis for staff development and changing 
cumculum. They, too, can be used in ways that provide aggregauble. quantitative data 

We must point out here that federally-mandated testing programs, particularly Chapter 
I, are perceived as a major obstacle to assessment reform by educational leaden at the state 
and district levels. So long as the multiple-choice measures are the essential tools to evaluate 
student and program progress, they will control curriculum and instruction and prevent 
distncts and states firom changing assessment and instruction to meet the needs of the 
students. 

These observations lead to three recommendations 

- The federal government should support research and development at both the district 
and state levels in constructing, intrcduang and evaluating a variety of performance- 
based assessments, and support staff development to take advantage of needed 
cumcular, administrative and assessment reforms. 

- The federal government can help develop methods of evaluating, quantifying and 
Hff^i^^i educational information from performance-based assessments. 

- The federal government must stop requiring forms of assessment that are 
educationally harmful. In particular, Guq>ter I lesting requirements must change not 
later than in the 1992 re-authorization, and the National Assessment of Educational 
Progress testing must not be allowed to control national education with multiple-choice 
testing. 



Experience over the past decade has shown that over-emphasis on one form of 
assessment, the multiple-chmce test (both norai- and criterion-referenced), has harmed our 
nation's ability to make needed changes in curriculum and instruction. While teaching to 
even a modestly adequate performance-based item would be superior in many ways to 
teaching to any multiple«<:hoice test, the danger of educational and evaluative corruption 
rtmams 

For example, in woodworking a performance-based curriculum and assessment coi'ld 
have a student construct a chest of drawen. Properly used, teaching to this task would ha 'e 
students explore many alternatives in consmiction, choose one and defend the choice, then 
actually make it Incorrectly used, the teacher would insist on a narrow range of construction 
possibilities (for example, only one kind of joint), teach only that narrow range (mdeed, 
repetitiously drill on the one joint), in order for the students to do well on just the one 
project The result may be high scores on the chest of drawers, but the students wouM not 
have learned enough to solve any other problems. i.e. make other types of cabinets requiring 
other types of joints. Thus, bodi curriculum and instruction ind assessment would be 
com^pted: the students would not leam broadly, and to the extent the work sample was 



4 



ERLC 




97 



supposed to represent a tnotder domain of leanung cabinetry, the results would be misleading 
and invalid 

The problem is that when heavy pressure comes down on administrators and teachers 
to ensure that students perform well on narrowly-defined tasks, even if they are performance- 
based, they will tend teach to the test m a narrow way and to the exclusion of other, needed 
areas. The tendency is also to over-emphasize what the teacher wants (regurgiudon) to the 
exclusion of student exploradon (guidod, active karmng). Both instruction and assessment 
are thereby damaged, and both students and soaetv suffer. The question is, how can the 
federal government, the states or the districts use testing for accounubility purposes without 
sabotaging the instructional process and narrowing the curriculum? 

At this point, FaiiTest believes there 8:e several parts to an answer. 

First, the primary goal of assessmeof must be enhancing the quabty of instruction 
Making portfolios the basis of assessment, with vanous types of tests established as 
complements, weU serves this purpose. Portfolios can then be evaluated ..i terms of goals too 
broKl and complex to allow teaching in a narrow manner. The Advanced Placement art 
portfolio assessments conducted by Educational Testing Service are an example of this: A 
vast array of artwork, including a portfolio of best pieces and slides of a range of work from 
each student, are evaluated by teams. Many kinds of an are judged as having artisdc ment, 
what is essential is the student's display of implementing her or his vision, of having an 
artistic voice he or she can put into effect using artistic techmques. There is thus neither 
need nor ability to teach narrowly to a narrow test. At the end however, it is possible to 
assign a number, or set of numbers, to each portfolio, on the basis of agreed-upon criteria, 
and these numben can be the basis for quantifiable, aggregauble data. 

Second, where testing external to the classroom exists, it should be done on a 
sampling basis and there must be sufficient items so that it becomes impossible to teach to 
any one or few items. This will require developing a large number of good items and 
training evaluators to evaluate such a wide range of items. It also requires developing the 
capaaty to equate many complex items so quantification becomes possible 

In short, variety, complexity and richness of forms of evaluation, guided by the 
understanding that without good activities in the classroom real learning will not take place, 
are the only means of dealing with the problem of corruption 

- FaiiTest recommends that the federal government help fund a vanety of assessment 
activities, giving primacy to those that encourage staff development through teacher 
involvement and that are most useful in instruction. These may be developed by 
districts, the states or even the federal government, but must be focused on improvmg 
instiuction first and provit ng aggre^atable data second 



5 



Q 34-661 - 90 - 5 

ERIC 



98 



In designing and implementing new fonns of assessment, many complex quesDons 
must be resolved. Three more are important enough to require consideranon here. 

First, removing or reducing the use of multiple-choice tests that are biased and 
introducing new forms of assessment that e.icourage thinking does nd mean that the new 
forms will not be biased As new assessments are introduced, it is essential that several 
things be done to reduce and eliminate bias. One, all smdents must be enabled to understand 
the meaning and processes of the new assessments. Two, the evaluaoon process constructed 
around portfolios must incorporate methods to detect and kddress teacher bias. Not only are 
portfobos a valuable means of helping teachers become better instnicton in the subject areas, 
they can be valuable methods of helpmg teachera overcome the ignorance that underlies much 
biased behavior. Evaluanon through portfohos, coupled with mterviews and classroom 
observaoons, can provide a basis for educating most teachers and removing those who refuse 
to change and grow 

Second, true performance-based assessments are not likely to have much in common 
with multiple-choice tests because they are not likely to measure the sam? things.' As a 
result, complex problems may devf 'iip for longitudinal, continuity data. FauTest strongly 
urges that the desire of federal or state agencies to hmit peifonnwce-based testing m order to 
preserve continuity datff be subordinated to the far more critical need to mtroduce well- 
developed performance-based assessments in order to assist fundamental school reform 
Controlling the new to meet needs rooted in the old that has failed is only a means to 
guarantee continued failure. Research on how to bridge data from the two means of 
assessment to continue making use of old data could be useful, but funding for such research 
also should be subordinate to developing and implementing performance-based assessment 

Third, issues of rehability and vabdity ci ^iformance-based ttssessments need 
contuiued investigation so as to enhance thcu- quality. Federal funding to help such studies of 
new assessment programs as they are designed and introduced would be a valuable use of the 
federal dollar 



In summary, the federal government has a valuable role to play in changmg 
assessment in our nation Through well-directed funding and changing certain laws, thr 
federal government can open up the possibility of using appropriate assessments and aasten 
tlie implementation of high-quality performance-based assessments 

To do the latter, fur Js must be carefully targeted. It is clear to FairTest that the most 
exciDng and powerful developments are now happening at the sute level, both within 
pamcular states and among states acting in consorna. While many districts and even 
ndividual schools and piograms are acDvel> engaged in needed assessment reform, it is at the 
state level that change which is both extensive and profound can best be developed That 



6 



ERLC 



99 



said, it is also clear that only when schools, teachers, administrators and parents are actively 
involved in the change process can reform really take hold in a comprehensive way. 

Therefore, both the sates and the districts are essential to the change process, but they 
have different roles. The role of the state is to develop and disseminate possibilities for 
pcrfonnance-based assessment, beginning with their own assessments, and including extensive 
teacher education in portfobo and other assessments, as Vemnont plans to do. The role of the 
districts is to implem^r.; forms of portfolio-based assessment as the core of instructional 
evaluation and to create processes of renewal that unleash the creativity and capabilities of all 
people workmg in and for schools. 

The federal government can and should act to facilitate this process. In funding, it 
should fund at both the sttite and district levels, and funding at one level should require 
interaction with the other level. Sutes not working with districts are ^>t to develop unused 
procedures or re-visit the failures of top-down dictates. Districts not woridng with states are 
apt to change in isolation and fail to help wider change, or to run afoul of state regulations 
that undermine kxal change. 

FaiiTest thus urges the federal fcovr^iment to proceed in the direction of encouraging, 
through funding and changes in law and regulation, the development and implementation of 
performance- based assessment that builds from the classroom up and that supports an 
instructional process that encourages thought, reflection, activity, engagement and creativity as 
ends in themselves and as die best means of developing basic anu more advanced academic 



The federal govemment*s steps m this direction should come soon, but must not be 
taken too hastily. We urge the federal government to use the principles and guidelines 
discussed above, or similar ones emergmg now from many sources including state 
departments of education and academic researchers. The government should carefully but 
expeditiously develop plans to assist fundamental change in assessment in our nation's 
educational systems, and thereby enable the needed changes in curriculum and instruction. 

Thank you again for the opportumty to testify 



Monty Neill, EdD. 
Assoaate Director 

Attachments: 

Endnotes 

Fallout From the Testing Explosion 

"Standardized Testing Harmful to Educational Health 



skiUs. 




7 





100 



NOTES 



1. Medina, Noe and NeiU, D. Monty, failoia From the Testing Explosion: How 100 Million 
Standardized Exams Undermine Equity and Excellence in America's Public Schools 
(Cambridge: FaiiTest, Third Edition, 1990). This report summvizes, witii extensive evidence 
and notes, the many problems associated with the tests; it includes an annotated bibliography. 
A copy is appended. Portions of die jcpan appeared in tevised form in Neill, D. Monty and 
Medina, Noe J. "Standardiied Testing: Hannful to Educational Health/' Phi Delta Kappan 
(May 1989) pp. 688-697; a copy is q>pended 



2. Resnick, Lamen B. and Resnick, Dtmel P. "Assessing the Thinking Curriculum: New 
Tools for Educational Reform/ in B. R. Giffoid and M.C. O'Connor, eds., Future 
Assessments: Changing Views of Aptitude, Achievement, and Instruction (Boston: Kluwer 
Academic Publishers, 19o9). 



3. Newmann, Fred and Aichbald, Doug. Beyond Standardized Testing: Assessing Authentic 
Academic Achievement in the Secondary School (Reston, VA: National Association of 
Secondary School Principals, 1988), see esp. pp. 56-59; Fredencksen, Norman. The Real 
Ttst Bias: Influences of Testing on Teaching and Learning,** American Psychologist (March 
1984) pp. l?3-202. 



8 





101 




ing psychotogy as a xierce a prcrfesson and as a means o* promoting human welfare 



Juni 10. 1000 



Coimiittae on Education and Libor 
U.S. HouM of Ripriaintitlvia 
B-346C Rayburn Houai Offlci Building 
WiahinQton. D.C. 20515 



In riiponij to ricint concirne ov«r th« uM of ■tindardUid tiata In 
education, the Anerlcan Peychologlcai Aaeoctetlon (apa) ii lubwlttlng thia 
atatanent to tha Houae Education aM Labor Comntttaa. follontlno itr June 7 
1000 hearlnfl on Teat Ing In Education. 

APA haa hlatrrlcaiiy aupportad aciantlfic and policy inltlatlvaa that 
have Improved tha development and uae of aaaeeament prectlcee and 
Inatrumenta. APA and ite divlalona heve developed profeeelonel etenderda in 
theae araaa whlcn have been widely eccepted in legel end legleletlve erenaa 
(a.Q., Stenderde for Educetlonel end Peychoioglcei Teete, Code of Fair 
Taetlnfl Practicae in Educetlon). In eddltlon, APA haa aeverel exending and 
ad hoc commltteee which ere cherged with addraeaing critical leeuee in 
eeeeev^^nt (e.g., Taak Force on tha Prediction of Diehoneety and Thaft In 
Employment Settlnpe, Joint CoiMlttee on Taating Prectlcea). 

There ere ee^erel ereee of concern regerding the uee of etendardliad 
teete in educetlon, meny of which ere outlined In tha report Pram Caf k— pt 
to fiatawav . produced by the Netlonal Conmiaelon on Teatlng and Public 
Policy. Theee concerna Include the emount of teat Ing that takee place In 
our echooia. eome ineppropr lata ueee of teat acore data, end the 
overrellence on teat acoree alone in making decielona about indlvldueie. 
APA beiievea that much of tha probleme aaaoclated with etandardixad teete 
(In education and aieewhere) ere not inharant In tr>e teete themeelvea but 
rather ere founded in their Inappropriate uee. 

if atandardlzed teete ere properly no med and validated, the/ cen offer 
Information about an Individual that cannot be obtained from other eourcea. 
It haa alwaya been deemed inappropriate to uee teet ecore data on an 
Individual to the exclueUn of otner informetlon. Teet acoree teken In 
conjunction with other Information (e.g . gredea. teacher revlewe. etc.) can 
anhan(.a our eblllty to make better and mora Informed declalona about an 
Individual 'a educetlonal needa and paet achievement. 



1200 SoMenlMnlh street NW 
Mbarvncion. DC 20036 
(202)9&3eOO 





102 



It It alio de«mad inippropr lit* to um tait ncor* dita In ■ manner 
oth«r than that for which tha taat waa developed. Educational achievement 
taata which are dealgned ea diaenoatic tooia to aid teachera In moettno an 
IndlvlAxfl atudent'a educetlonal neede ahould not be uaeJ is neaeurea of a 
achool'a educational progreaa. The oppoalte la true aa well, thet meaaurea 
of educational effectlveneaa, auch aa the National Asaeaament of Educational 
Progreaa (ae currently dealgned). ahould not be uaed for individual 
aaaeaament. with thia kind of proper uae of atandardlied teata, practlcea 
auch aa 'teaching to the teat' may be curtelled. 

APA'a approach to auch iaauea with teating haa been one of education 
and training. We believe that, when properly developed and validated, 
atandardlzed teata can enhance our ability to make important decialona about 
Indlvlduaia. if they are uaed aa the> are intended. 

At preaent. there lewn to be many propoeala for advancing methoda of 
alternative aaaeaement APA aupporta ongoing acientlfic inquiry into t»o 
alternative aaaeaement approachea and performance-baaed teating and 
evaluation in education. APA'a concern la that alternative meaaurea be 
''enable and valid. Many propoaed alt^rnatlvea - teacher obaervat tons, 
erhlbltlona, portfoitoa of atudent work, check.lata, and open-ended 
Quest lona > have not been deraonatrated to have adequate reliability or 
validity. Historically, atandardlzed meaairea were developed to correct 
thia problem. 

Additionally, proponenta of alternative meaaurea aee the* aa correcting 
the problem of cultural blaa In teating in fact, many auch alternatlvea 
have been demonatrated to be more auaceptlble to Idloayncratic bel lefa or 
aubject:ve judgement than traditional atandardlzed meaaurea. Where actual 
differencea between groupa exiat, the introduction of alternative approachea 
may maak but will not ellmlnete theae differences. 8y maaking theae 
differencea coMpenaatory atrategtea dealgned to enhance opportunltea for 
diaadvantaged groupa may be loat 

Aa a developer of profeaalonal atendarda on educational and 
paychotoglcel teating, the American Paychologlcal Aaaociation remains 
extremely Interested in the quality of aaaeaement Instruments snd messures 
that ere used snd develo, *d. APA supports the scientific reseerch into 
siternstive assessment epproachea to aaaure that any aaaeaement methoda uaed 
to make decialona about Individuate be reliable and valid we look forward 
to the Office of Technology Aaaeasment'a atudy in thia area and hope there 
la a atrong emoheaia on examining the reliability and validity of new 
aaseaamant approachea 



ERIC 




103 



Slnc^rtly. 

L^i, P. Liptitt. Ph.D. !V2?li/?"i^J!^?;. 

Ei«cutlv9 Olrtctor for Scunct Olr^tdf for scitntific 

AMrlean Ptychoit«lc«i A«tocutloo A^rictn P»ychoio«lc«l i 



ee* CoMittM for Ptyeholooleai TMtt vnd A«M9aMnt 
Raywntf 0. Powitr, Ph.D., Chiof Exteutlvt OfHctr 
AMrican Paychologlcal /^atoclation 
Dianno C. Irown. Toating and AaaMtMnt Offiear, 
AMriean Payehoiogieai Aaaociation 



104 




Cc^indl for Basic Ediif^non 



A.GftAf1AMDOWN 



Congroiinan Augustus p. Hawkins 

Chair, CowiittM on Education and Labor 

B-346C Raybum House Office Building 

Washington DC 20515 ^ 20 June 1990 



Dear Cor.gressnan Hawkins i 



Thank you for your letter dated 6 June 1990 asking for views and 
™rl« r*S;??i^"?^ assessment. This letlirCi??"* 

sunnarise a ballooning literature on the subject. SuDPortina 
aaterials are available if you or your staff need ^^l^. 

My own perspective is the effect of testing and assessaent on 

curricula, and instruction-what children learn. The^«nc? for 
Changing from multiple-choice, machine-scorable test! ^^'^'^'^^ 
K^Tf^lt"?.""?'?"??^" ^' ^ ««ct that the multiple- 

o5^iJ.j!ri5 curriculum, reducing it to a series 

v^*^^" require children only to recognise them, 

use them. American .ociety needs taoughtful adults 3ho^A 

«inn^^2^^^•^*^;P^'^° intSlUgently tSe 

resources of a technological world. 

V^?«"^ly curriculum and instruction which 

iSa? ^hS^'rhJn^S^ !>»l^iPlr-choice tests, it is a national tr^geSy 
M ^.5? Chapter l legialation mandates ( ind Department ^^''"^'^^ 
Education regulations reinforce) a natioAi-Uy nSSed and 
aggregatable test which at the moment must bi multipl "choice 
Hany -tates are .eeking alternatives, among them yoSr ow^ .tSie 
uL ?irlSflli^/^* California Assessment Progr« is pr^sJ^J to 
cS!ot2rT^?^K "???^w!°K°*'^«" performance assessments ior ' 
rS!S^f .L*'^''*' ^P°^*^ accurately on the program and 

r^i^-^i Sk'^;^^!! * ^^^^^^9 curriculum for Chapter 1 
children. These effcrts need encouragement froe your coMit tee. 

iSJterr'*^" ^° ^^"^ questions at the bottom of page l of your 

assessed by asking students to do what we 
want them to do-write, solve problems, display 
H^^^frS'^'^i"?: "^l** includei direct writing assignments 

SS«^??n5j ^''^^■^i*' 9^°"P"» portfolios; open-iSed 

SSt^^rr/" ~J»^«"^ica and acienciT experiment i^and 
S'jfi.^^f'?*"^ materialsrslaulations, 
debates, and mock trials; problem-solving contest«s (the 



SatiormluJtiHUtes uf the I'herat artsMaU elemrr^i^iuTe^^ 



ERIC 



105 



OdytB«y of th« HinrX, m,q,), 

AasMBiMnt prograaa at th« P«d«ral, ttat*, and local l«v«la 
could ba intagratad by a ■•riaa of intarlocking group 
grading ■•■■iona. Jjmt mm axplain by building on tha axampla 
of tha California AnaaaaMnt Progra^'a highly auccaaaful 
grada 8 and grada 12 wiiting aaaaaaMnta. Bach yaar, 
groupa of taachara froM acroaa tha atata a^^^ra tha aaaaya, 
which can ba writtan on up to aight diffara7it topica, 
aaaaaaing ability to write in iiffaarant raal-lifa ganraa. 

How imagine t^^t otnar atataa in tha Heatarn Region have a 
aiailar writing aaaeaaaent. Ten percent of the papera froei 
each atat ) are acored again (anonyaoualy of courae) by » 
group drawn froa tha atatea repreaented. The aane procaaa 

would go on in other regiona of the country— ^Southaaat , 
Atlantic States, Central, etc. 

finally at the national level, 10 percent of the regional 
papera would be acored by a national conaiittee. 

Why Jr this instead of expanding NABP? For theae reaaonai 

T'ia ia not an additional aaaeaaMent — it uaea axiating 

state (and/or local assesaaents ) ; 

It involves teachers, mdminiilzz^zzz , parents in 

scoring groups, thus infoning then directly about what 

students can do and should be able to do; 

It is a bottoai^up, not top-down, process, giving tha 

people closest to the classrooa ownership cf a 

professional responsibilityj 

Because of the large nuaber of people involved, 
inforaation about standards is widely disseainated. How 
many people can cite the results of NAEP assessments 

now? 

The process is sometimes called "group moderation.' It waa 
proposed as a feature ;>f the new English national 
assessme;t. but was not adopted or funded by the English 
governrant. The U.S. Congress has an opportunity to 
demonstrate educational leadership here. 

The adverse effects of testing can be minimised by phasing 
out multiple-choice, machine-scorable tests designed by test 
publishers . 

There are minimal adverse effects of performance 
assestsments, since many assessments are no different from 
and in some cnses better than ordinary classroom activitiea. 
The Now York State Grade 4 students who took the science 
manipulative skills te«t in May 1989 and May 1990 wrote 
"Thank-yoM" on their papers and asked could they do the teat 
a^ain tomorrow. 




ERIC 



106 



* CcMiprtthttniivtt syfttMU of •■■•■■Mnt can b« dttv«lop*d by 
•xpanding th^ pool of p«rfor««nc« a»«B«MiitB and using 
paychOMtric •xp«rtia« to d«v«lop sound scoring Mthoda. 
AsaaasMnta ahould concantrata on progrmn and achool 
aaaaaasant, which Mana that Mtrix aaapling can ua«d 
widaly — not avary atudant naada to ba aaaaaaad. (Howavar, 
aoM parformanca aaaaaaaonta 3 ika tha Naw York Stata acianca 
taata are ao intriguing that no-ona wanta to ba laft outlj 

Studant aaaaaaaant ahould not ba uaad for aalaction and 
aorting. It ahould ba davalopad aa a profila of tha 
atudant 'a atrangtha and Maknaasaa, with sultipla 
indicatora, navar a aingla acor^. 

* Tha Fadaral rola in iaprovin'^ taating and aaaaaaMnt ahould 
ba laadarahip, not ragulatic^ or inpoaition of top-down 
aaaaaananta liKa NAEP. Tha Fadaral govarniiant ahould apacify 
aducational outcoaaa and than aaaiat atataa and localitiaa 
to maat thaa. 

Tha Dapartaant of Education ahould ba a raaourca, 
davaloping, raaaarching, rafining parforaanca aaaaaaaanta. 
It ahould ancouraga axpariawntation at all lovala and offar 
axpart aaaiatanca to atata and local aducation authoritiaa 
aaaking to naka currirulua and aaaaaaaant coaplaaantary . 
It haa an obvioua rola in coordinating a nation£.l "group 
Bxxlaration, " aa daacribad abcva. 

Tha iaaue of chaating on .aata (tha focua of Cannall'a booka) ia 
not ralavant whan taata ara changad and bacoae parforaanca 
aaaaaaaanta. It ia a ra^i harring which diatracti froa tha raal 
iaaua. Chaating on taata haa littla to do with what childran 
laarn; it aaeaa to be focuaad on axpoaing an irralavant crlaa. 
Tha iaauea ara teaching and learning, and enauring that 
reaaonabla deaanda for accountability do not intrude on thaa or 
diatort thea. 

I aa available for further information and diacuaaion of thia and 
other educational iaauea. 

Sincerely 




Ruth Mitchell 
Aaaociate Director 





107 




OF 



June 21, 1990 



Honorable Augustus F. Hawkins 
Chalnan 

CoMiittee on Education and Labor 
U.S. House of Representatives 
Washington, DC 20515 

Dear Mr. Chairman: 

ThanX you for your letter of June 6, 1990, requesting »./ 
views on the subject of educational testing and assessment . 
I would have been pleased to appear at a hearing on this 
subject because I agree with you that testing and assessment 
must be addressed if we are to improve our nation's educa- 
tional performance. I hope you schedule additional hearings 
on this subject in the near future so that I can have the 
opportunity to explore this complex and important issue with 
you in greater depth than a writtan statement allows. 

I am gratified that the Congress is taking an interest in 
the role and effects of traditional standaraized testing on 
the quality of teaching and student learning in our nation's 
schools. Testing is a major enterprise in our education 
system, driving federal, state and local education dollars as 
well as instructional decisions. The nature and quality of 
the tests we use, and how we use them, are therefore of vital 
significance. 

T^e 750, 000-memi>er AFT has long supported testing, 
chiefly for these reasons: We have no other comparably 
reliable means for determining if and how well the nation's 
youth is b<aing educated and the extent to which our schools 
are discharging their public responsibility. In particular, 
we have no other means for measuring progress toward over- 
coming our past legacy of denying equal educational oppor- 
tunity to poor and minority youngsters and for assessing the 
inequitj ^s that continue to exist. Moreover, the public 
deserves — indeed, has a right — to know what we are 
getting for our education dollars. 

But while the AFT supports testing, we are critical of 
the quality of the tests most commonly used in our school 
system and the ways in which they are employed. Briefly 
summarized, the AFT, along with a growing number of testing 




ERIC 



108 



Honorabl* Augustus F. Hawkins 
Pags Two 
Jun« 21, 1990 



and •ducation experts, has become convinced that these tests 
tend to narrow teaching and learning — indeed, nay have 
contributed to the "dumbing down" of America. Additionally, 
existing tests severely constrain promising education reform 
initiatives. 

These problems associated with standardized testing 
deserve serious national attention and a commitment to 
developing reliable, publicly useful assessments that help 
promote educational achieveiaent . Unfortunately, encouraging 
local districts to develop new assessments is noc the best 
means to achieve that end. In fact, we fear tnat this well- 
intended measure would add another layer of testing and 
assessment to already overburdened students and teachers. We 
also do not believe that new, district-developed tests — 
each of which would be different can yield trwnd data or 
comparable information, thereby exacerbating the existing 
problems in education reporting. Moreover, since the 
capacity of local school districts in alternative assessment 
is very thin, this measure is likely to add to the already 
plentiful supply of education hucksters that districts are 
prey to while reducing the impact of responsible groups, 
including some states, presently working on new assessments. 
Quality control in developing new assessments is, in short, 
essential . 

Congress should also be aware that the U.S. Department of 
k:ducation's Office of Educational Research and Improvement is 
in the midst of competing many of the federal research and 
development centers* a center on testing among them. Any new 
legislation affecting assessment ought to proceed in light of 
the results of that competition. It also would be appro- 
priate for Congress to consider any assessment initiative in 
light of the national education goals adopted by the 
President and the Governors. 

The AFT has offered responsible criticism^* of the present 
testing system. We are eager to cooperate with legislative 
and other means to develop assessment systems that not only 
overcome the problems of the present systems but also help to 
stimulate needed improvements in educational achievement. 

NeverthttleSto the APT urges caution when it comes to a 
locals district-based strategy for developing juk assess- 
ments, especially without getting a handle on existing 
testing. We need to address the issue of standardized 



ERIC 



11c' 



109 



Honor«bl« Aucruaciw P. Hiiwlcina 
P«9« Thr«« 

Jun« 21, 1990 



tMting an • natioi^ with such mt ttaks in ths issus. Pro- 
po««d soxutions that diffusa authority snd raaponaibility for 
dnvelopinq a valid aaaaaaaant ayataa could hava tra9ic 
cona«quancaa for our aducationai ayataa. 

z look forward to furthar dialogua with you on thia 
critical iaaua. 



Albart Shankar 
Praaidant 




AS/dr 

op«iu2aflcio 




no 

THE AMErI^^ U NIVE RSITY 



\\:V^H!\CTO\ IV 



The Need for a 
National Assessment of Educational Progress 
in Foreign Language Competence 

bv 

Daniele Ghiolfi Rodamar 
Assistant Professor 
American university 

Oversight Hearing on 
Testing in Education 
The Subcommittee on Elementary* 
Secondary, ard Vocational Education 

2175 Rayburn H.O.B. 
U.S. Connress 
June 1, 1990 



panmrnt of L^gua^e and Forei|[n Studies 
4400 Massachusetts Avenue. N W . Washington. D C 20016 S045 (202; 8S5'25S! 



ERIC 



114 



Ill 



Mr. Chairman, members of the Committee, this morning I am 
honored to present testimony. My name is Daniele Rodamar. I am 
an Assistant Professor of French literature and language at 
American University in Washington, D.C. The following testimony 
reflects my experience as a university level foreign language 
instuctor for over a decade and as a faculty member with 
responsibl ility for foreign langauage curriculum development, 
program coordination^ and assessment for elementary and 
intermediate French language courses. I am speaking as an 
individual, and my testimony does not necessarily represent the 
views of America University. 

DEMAND FOB FOREIGN L ANGUAGE SKUJ*^ GROWING 

Today's kindergarteners will graduate to a world that will 
provide many opportunities to put foreign language skills to 
work. Language education is a fundamental element of curricula in 
our nation's schools. As Bill Honig, California s Superintendent 
of Public Instruction said in launching a campaign to strengthen 
California's K-12 language education: "Learning a foreign 
language opens many doors for students. It allows them to 
compete in an international :ob market where proficiency in 
another language is no longer a luxery but a necessity. They 
also better understand our own diverse society and develop 
communication skills necessary to expand their perspectives of 
the world.* The trends in trade, foreign investment, 
international tourism, the increasingly global organization of 
business and other factors are increasing the need for foreign 
language skills. 

INFORMATION ON ACHIEVEMENT IS MISSING 

How are we doing in strengthening America s foreigr language 





112 



education? while there is some data on "process" variables, such 
as enrollments, "seat time", the number of foreign language 
teachers and so on, we know little at the national level about 
the proficiency of the j^tudentG who graduate from these language 
programs. Anecdotes (such as the efforts to sell the "no go" 
NOVA Chevv in Latin America, the Pepsi "Come Alive" ad campaign 
that failed m Thai^md when it was translated as, "It brings 
your ancestors back from the dead", and President Carter's speech 
that told of his "last** for the Polish people) suggest that all 
IS not well. Two thirds of the translating jobs at the U.S. 
Department of State are filled by foreign-born individuals 
because properly trained Ame^-can-born candidates are not 
available. The pattern in tue private sector does not, appear much 
better. The snapshots of language proficiency provided by 
various studies reinforce these concerns about the foreign 
language proficiency of America s students. 

Assessment of educational progress is a fundamental element 
in strengthening educational achievement. This has been 
recognized by the Coalition for the Advancement of Foreign 
Languages and International Studies (CAFLIS) which represents 
165 me-nber organizations from all levels of education, the 
business community, state and Iccal governments, language and 
exchange groups, and others. CAFLIS has called for assessments 
of progress in foreign languages and international studies as 
pait of a plan of action for upgrading foreign language and 
international studies education. Assessmencs of foreign language 
achievement should be a mandated element of the National 
Assessment of Educational Progress (NAEP). If done m a 
responsible and methodologically sound manner, a national 
assessment of educational progress in foreign languages will 
encourage improvements in language education not only by 
providing information on how we are doing but also by 
spotlighting the importance of foreign language education to the 



-1- 



ERIC 



113 



nation and sending a clear signal that foreign language needs tc 
be a core element in the curricula of our nation's schools in the 
elementary and secondary level as well as in our nation's 
colleges and universities. 

A NATIONAL OBJECTIVE; A LANGUAGE COMPETENT AMERICA: 

There has been growing awareness of the need to strengthen 
foreign language education in the United States. In November 
1979 the President's Cononission on Foreign Language and 
International Stuides pointed with alarm to our citizens lack of 
international knowledge. As the Chairman of the Coiwaission notea 
in transmitting the study to President Carter, "the ha'^d and 
brutal fact is that our programs and institutions for education 
and training for foreign language and international understanding 
are both currently inadequate and actually falling further 
behind. This growing deficiency must be corrected if we are to 
secure our national objjectives as we enter the Twenty First 
Century." By the mid-1980s reports calling to strengthen *'oreign 
language eduction began to be made by groups with the power to 
actually influence events in out schools, such as the Council of 
Chief State School Officers, the National Governor's Association 
and the Southern Governors Association. 

Earlier this year, following ths Charlottesville "Education 
Summit", the President and the nation's governors agreed to six 
major goals and twenty six objectives for educational improvement 
by the year 2000. Two objectives relate directly to second 
language study j others are more indirectly related. The 
President and the Governors gave high priority to tho development 
of quality assessments to monitor progress toward these 
educational goals and objectives. 

In brief, the increasing importance of foreign languages for 
U.S. security, prosperity, and growth nas been increasingly 
recognized by leaders in education, business and government. 

-3- 



ERIC 



114 



TRANSLATING OBJECTIVES INTO ACTI ON t 

The increased emphasis on la guage skills has been 
accompanied by growing enrollments. Recent surveys conducted by 
the Joint National Committee for Languages found that 30 states 
have instituted or increased foi^eign language requirements in the 
last ten years. Recent figures indicate a ten percent increase 
in foreign language enrollments during that same period. 

The impact of these changes is 3ust beginning to be telt. 
For example, the state of New York's Action Plan to Improve 
Elemen'uary and Secondary Education Results includes a commitment 
to second language education for all students. Beginning with 
the class of 1994, all students will take at least two years of a 
second language prior to grade 9 and additional incentives for 
continuing language study are made in the form of requirements 
for the Regents' DiplCMna. in Cal if ornia--which has as many 
students as the smallest 24 states combined-- the Hughes- Hart 
Education Reform Act of 1983 mandated one year of foreign 
language study as an option to meet high school requirements. 
California's public universities have required at least two years 
of study of a single foreign language for admission, and the 
state's Board of Education has recommended tha*- all high school 
students complete two years of study in a foreign language. 
Californi" enrollments in foreign languages grew by a third 
between t981 and 1987--but only 14% of the students m 
kindergirten through 12th grade were enrolled. Ir brief, 
important changes have been initiated and their full impact will 
be felt in coming years. National level information on trends m 
achievement in second language proficiency t^iat can be 
disaggregated to at least the state level is vital in building 
effective foreign language programs. 

THE NEED FOR NATIONAL ASSESSMENT IN FOREIGN LANGUAGES 

University teachers already have high schor«l tra cripts i nd 
advanced placement tests to know know what language skills 



ERIC 




115 



Students are bringing to campus. In a few states^ state 
assessments of foreign language achievement add information. 

While this may be enough to create a wall-chart» this leaves 
postsecondary faculty* as well as K-12 faculty without a clear 
picture of how the system as a whole is working. In some subject 
areas there is not even a yardstick of achievement: the College 
Board provides widely used achievement tests m German* French, 
Spanish, Italian, Hebrew, and Latin- -but not in any of the Asian 
languages. This forces 3ach postsecondary institution to provide 
its own hit-or-miss assessment and sends out a signal to 
students, parents, teachers, and administrators about the 
relative importance of languages. 

A national as><?cssment ot educational progress in foreign 
languages is important in getting authorizations and achieving 
funding for foreign language education. The monitoring of 
achievement by a National Assessment of Educational Progress in 
Foreign Languages would spotlight "how we are doing" and would 
send a clear signal that results matter. 

We face major problems in our efforts to improve language 
education. Too often teachers who have very limited proficiency 
in the language they are supposed to be teaching operate without 
effective training, feedback and support. The need to fill 
elementary and secondary classrooms with "a warm body" often pre- 
empts questions about the results. This absence of quality 
information on how we are doing makes it difficult to drive 
improved program performance and improved articulation across 
grade levels. The picture is further clouded when teachers 
pressed with the need to keep students, parents, and 
administrators happy allow giades to creep upward without 
corresponding increases in achievement. The problem is not grade 
inflation by individual teachers. It is more serious than that. 
Wc simply do not. know how the system is working and this lack of 
information moves the emphasis to process rather than results. 

-5- 



116 



Without information on how the whole system is working, there is 
little systematic pressure to upgrade the quality of teaching, 
and to provide the funding needed for materials^ salaries, and 
articulation across grade levels. 

A national assessment of educational progress in foreign 
languages, would provide information on how the nation as a whole 
is doing--and on how one state or region is doing rela*'ive to 
another. Such information can play a key role in driving 
dissemination of effective programs, building support for 
adequate funding and improving articulation across grade levels. 
The requisite consensus building process can help ensure that 
programs reflect the language competency needs in business, 
induftry, agriculture, the professions and government, as well as 
in teaching and research. 

Today we have too little language education too late m the 
educational program. Information on foreign language acheivemt t 
of students at lower grade? would provide a fulcrum for 
leveraging improvement and for providing a more realistic time 
table for students to learn foreign languages. This is no small 
matter. As California's Foreign Language Framework put it, "No 
matter how good the pedagogy, students will not become fluent in 
a second language by attending a 50 minute class five times a 
week during a single school year. Mastery of foreign language 
takes time. (In Europe, Japan, and the Soviet Union, for 
example, five to seven years are generally allocated to the stud> 
of English or another foreign language. ) For school 
administrators interested in building a successful language 
program, the requirement for a large bloc^ of time has two clear 
implications: First, it signals the need to move the beginning of 
the sex lous study of language into the kindergarten thrugh grade 
eight years. And second, it highlights thff importance of 
district wide strategic planning so that continuit> of learning 
IS not interrupted " A national foreign language assessment 



-6- 



ERIC 




117 



would help draw attention to these issues. 

For university and college teachers* such as myself » this 
information would provide a useful basis to work in academic 
alliances to upgrade K-12 language education. The process of 
assessment and interpretation would force K-12 and postsecondary 
education bureaucracies to face the issue of what they ar*» doing 
and what the lesults are. In teacher training it would help 
provide vital syst*»matic feedback on how the people we are 
turning out with degree- are doing when tfiey find themselves m 
front of a classroom full of typical American kids. This is 
system level feedback that teacher certification or other process 
variables cannot provide. 

In sum» a national assessment of educational progress m 
foreign language education provides information on how the system 
IS doi.-.g and serds out a signal that language matters and is a 
vital part of the curriculum, Statt by state and other 
comparisons properly conducted can aid in identifying and 
disseminating models of effective language education. And 
information on what other students are achieving can provide 
useful information to students that motivates their language 
learning strategies, 

THE FOUNDATIONS FOR LANGUAGE ASSESSMENT ARE IN PLACE 

A substantial portion of the research and development 
necessary to institute a national assessment of educational 
progress in foreign languages is already underway. While foreign 
language education-- 1 ike other areas assessed by NAEP--seeks to 
build a complex of skills and to achieve a variety of goals, 
guidelines for the assessment of foreign language proficiency 
have been developed by the American Council of Teachers of 
Foreign L&nguages (ACTFL), Pressed with the need for assessment 
of foreign language proficiency, the U,S, government has long 
conducted assessments of language competency for use in placement 
and as a guide to future training. The state of Co,')necticut has 



-7- 





118 



already conducted its own assessment of foreign language 
profiriency in its schools. The Educational Testing Service has 
long provided foreign language achievement tests for use in 
placement of students entering t 'stsecondary education. 

There is already action to move beyond this. The American 
Council on the Teaching of Foreign Languages and the Educational 
Testing Service, working vith the testing descriptions oeveloped 
by the U.S. Government Interagency Language Roundtable, have 
initiated efforts to forge a consensus among language educators 
regarding proficiency standards appropriate to traditional 
settings . 

NAEP IS the appropriate location for an assessment of 
foreign language achievement. For over ♦■wo decades NAEP has 
provided valuable information .^t tne national level on the 
quality of educational achivement. The often troubling results of 
these assessments--along with ether streams of information such 
ds ACT and SAT scores, dropout rates, reports from f»mplo>ers, and 
sc on--have helped trigger and sust m the school reform 
movement. NAEP is the onl> regular national level assessment of 
acnievem^nt in core curriculum areas. L'nder the Hawkins -Staff ord 
Act (PL 100-297) NAEP has been expanded to provide a wider 
of comparisons across core curricula. Adding foreign languages 
to the assessments of H?^.? would build on an established 
institution and would send => powerful siqnal regarding the 
central ity and importan.^e achievement in foreign language 
ed jcat ion. 



CAVEATS : 

Ainerica has benefited from ha.ing a highly decentralized 
system of education which allows for diversity in goa^s and 
approaches and encourages flexiblity in meeting local needs. 
Many Americans have viewed national level assessments of 
achieveirent with ^-xtrpme caution, aware that no assessment can 



MC 1 



119 



measure every thing--and that what is left out ma^ be as important 
as what is included. While an assessment may be more or less 
•curriculum neutral all assessment scores have a necessary 
correlation with curriculum. In a field as diverse as language 
education, this does not reduce the need and alue of assessment 
but It warns against over interpretation of results. The 
diversity is quite real. A 1976 study by the Articulation 
Council Liaison Committee on Foreign Languages found that not 
even an area perceived to be as central to language instruction 
as vocabulary was standardized. Among 28 elementary and 
intermediate German texts examined, less than five percent of the 
total words listed were common to all texts. Subsequent studies 
showed 'hat student and K-12 teacher perceptions of what was 
expected in postsecondary programs varied greatly. The dynamics 
of consensus about what is important in the rapidly changing 
field of language education makes ic strongly advisable not to 
attach too much weight to any single measure. 

Th' applicability of a national level assessment for :)udqing 
tho success or failure of individual state or local level program 
reforms is at best questionable because assessment scores may 
change for reasons having little or nothing to do with the 
assessment, including changes in student backgrounds and 
curriculum alignment. This is another reason why NAEP complements 
other information (such as state assessments and SAT scores and 
postsecondary education or employment outcomes^ on how *te are 
doing. If truth is, as one methodologist claimed, the convergence 
of independent btreams of data, then it is vital that in o. 
dynamic and diverse system such as our own that this diversity of 
approaches and measures be preserved. While it is appropriate 
that NAEP inform education debates and programs, the necessary 
imperfections of measurement by any single instrument and the 
importance of encouraging constructive debate an^ong researchers, 
teachers, parents, and students make it essential that NAEP 
continue to complement other data streams rather than preempting 

-9- 



ERIC 



120 

or defunding them. 

NAEP IS valuable in providing an assessment that no one 
aligns curriculum to meet. It informs rather than co»>rces, and 
as such fits with thr best traditions of our nation s education 
system. It is important that NAEP continue to be used in ways 
that inform education, that strengthens rather than und^rminc^a 
education. La 7uage education is multidimensional and "ursues 
mulitiple goals: NAEP must acknowledge this. Multiple choice 
tests are helpful, but not enough. NAEP must continue to move 
toward improved and authentic assessment. Assessing competency 
in a language is not the same as testing achievement. Achievement 
tests are constructed to check mastery of soire discrete body of 
material covered in a course of instr ction. The> provide 
feedback, but they typically test for specific, often unconnected 
elements of language. A competency test on the other hand is a 
holistic assessment of what the student can actu.illy do with the 
language in a unrehearsed situation. The student s response to a 
testing prompt is not simply right or wrong; it is indicative of 
a stage of competency and helps define the student s performance 
level. A competency test addresses what can be done now. NAEP 
has emphasized these issues of competency in other assessment 
areas, and should do so in the area of language as well. Process 
and context cannot be ignored if we want to know how programs are 
working. That is wh'_ information on variables such as access to 
language education technology and proficiency of Hispanic 
students studying Spanish, and teacher proficiency should be 
provided . 

In brief, while NAEP should be part of a lirger system of 
research and feedback* it can provide a useful contribution that 
will play a critiCdi role in improving language education in our 
nation. 

What matters to the nation in language assessment is the 
level of proficiency of students in using a second language. 

-10- 



ERIC 



121 



Since the ability to use language skills in real world contexts 
IS the priority, the assessment of foreign language skills should 
focus on foreign language proficiency: on the ability to use 
language rather than their achievement's in reciting the 
vocabulary or syntax of any particular textbook or group of 
textbooks. The increasing emphasis in many classrooms on real 
world interactions — through tcilecotnmunications, non-textbook 
printed materials and so on m2tke this emphasis on 'proficiency', 
rather than on achievement' in the narrow sense, particularly 
important. This is consistent with the approach used in 
assessments, such as the NAEP reading assessment, which 
emphasizes assessment of the reading skills needed to function in 
today 's America. 

The variety of languages studied across our nation and the 
costs of assessment pose the difficult issue of which languages 
to assess. The large ma3ority of American students study Spanish 
and French. Other languages, such as Arabic, Chinese, Japanese, 
and Russian are studied by relatively few students but may be 
deemed of national interest fo." strategic, economic, and other 
reasons. Here again the consensus building effort that 
characterizes NAEP are particularly appropriate for determining 
which languages to assess and with what periodicity. 

CONCLUSION: 

The establishment of a national assessment of educational 
progress in foreign languages would be important because: (1) it 
would provide vital information to students, teach<»rs, and others 
about how foreign language education programs are working, <2) it 
can help identify foreign language programs that work and 
strengthen the ability to disseminate those programs, and {Z) it 
sends out a clear signal to students, parents, teachers, 
administrators, legislators and others that language education at 
the K-12 level is an essential part of the curriculum and that 



-11- 



1 

ERIC 



122 



competency matters. 

National asscisments of achievement in foreign languages are 
an essential tool in upgrading the quality of foreign language 
instruction. I respectfully urge the members of this committee 
and of the U.S. Congress to mandate foreign language assessment 
as a regular component of NAEP. This--along with ongoing input 
from the field and regular Congressional oversight--can play a 
vital role m upgrading language instruction, in helping to meet 
the national goals in education, and in allowing America to 
transform the manj challenges that face us in this rapidly 
evolving world economy into opportuntities . 
Thank >ou. 



-12- 



ERLC 




123 



STATEMENT OF 



FREDERICK H. DIETRICH 

VICE PRESIDENT FDR GUIDANCE, ACCESS, AND 
ASSESSMENT SERVICES 

TNE CDLLE6E BOARD 



TO 



CONNITTEE ON EDUCATION AND LAiOR 
U.S. HOUSE OF REPRESENTATIVES 



WASHINGTON, D.C. 

JUNE 29, 1990 



o 11/ 



ERIC 



124 



Nenbers of the Conwittee on Education and Labor, I am Fred Dietrich. Vice 
President for Guidance, Access, and Assessment Services at the College Board. 
I very nuch appreciate the opportunity to cownent upon testing, assessment and 
evaluation issues currently being considered by the Comnittee on Education and 
Labor. 

Founded in 1900, the College Board is a nutional nonprofit association of more 
than 2700 colleges and universities, secondary schools, school «iysteins ^nd 
education associations and agencies. The Board assists students who are 
m^ing the transition from high school to college through services that 
include guidance, admissions, placement, credit by examination and financial 
aid. In addition, the Board also sponsors research, provides forums to 
discuss common problems of education and addresses questions of educational 
standards. 

The College Board firmly believes that quality assessment of student skills 
and achievement is ultimately crucial to the long-term social, economic, and 
political well-being of the United States. Nothing is more important to our 
future economic growth and social progress than education of the highest 
quality. Used sensitively, instruments of assessment can help achieve that 
end. 

Over the last decade, the issue of standards and expectations of students has 
been a particular focus of the College Board's Educational EQuality (EQ) 
Project. EQ's effort: in the first part of the 1980s resulted in a set of 
publications describing "what students should know and be able to do" on 
graduating from high school. Academic Preparation for College . knov<n as the 
Green Book, describes learning outcomes for high school curricula in six basic 
academic subjects--En9lish, the arts, mathematics, science, social studies, 
and foreign language. It also identifies basic academic competencies-- 
reading, writing, speaking and listening, mathematics, reasoning, and 
study in9--which depend on, and ^re further developed by, work in th^se 
subjects. The "rainbow" serits goes further in providing specific curriculum 
and instructional suggestions about how to achieve the results outlined in th* 
Green Book. 

EQ's work has involved consensus building among teachers. Hundreds of 
educators from both schools and colleges helped to compile the Academic 
Preparation series. This series does not address specific grade levels but 
rather the learning outcomes which should result from a student's exposure to 
a full educational experience through twelve grades. 

The consensus of educators involved with EQ is that much of the Green Book, 
and in particular the basic academic competencies, are appropriate for both 
college- and work-bound students. He believe it is important to promote high 
academic standards for ill students, rather than setting minimum competencies 
for most and tougher expectations for some. The goal should be to give aVI 
high school students access to the knowledge and skills necessary for entering 
and completing higher education. Some may not go to college right away, but 
we should try to keep their options open. Moreover, employers have told us 
that the EQ basic competencies are what they need in new hires. In terms of 
basic skills, there may be littie difference between what is needed by the 
college-bound and those headed for employment. 





125 



Also particularly relevant to your deliberation are two i'rochures— Ib$ 
Educational Equality Project and College Board Exawinatlr ns and IffprQvIng 
Academic Preparation for College: TheJ^ole of Assesswen .— which address the 
"congruence" between our tests and the EQ-defined coiipetencies and skills. He 
will be pleased to provide all these itens to the Comai.tee. 

Several College Board instruments could be helpful to your present diicussion: 

0 Descriptive Tests of Lanquage and Natheitatics Skills— designed to 
assess the battery of skills (writing, thinking, reading, analysis, and 
mathematics) that students must have to perform well at the college 
level, closely aligned with the goals described in Academic Preparation 
for College . 

0 The Advanced Placement (AP) Examinations--a program of college-level 
courses and examinations for secondary srhool students in 16 
disciplines. About 37 percent of American secondary schools currently 
participate in the program, serving approximately 17 percent of their 
college-bound students. Those of you who have seen the movie "Stand 
and Deliver" will know how impo<^^ant and valuable this program can be 
for minority students. 

0 The Achievement Tests—a series of 15 tests in 14 subject areas taken 
by some 300,000 college-bound students each year and designed to 
measure knowledge, and the ability to apply that knowledge, in specific 
subject areas. 

0 The Scholastic Aptitude Test (SAT)--a nationally administered test that 
measures developed verbal and mathematical reasoning abilities related 
to successful performance in college, taken by some 1.8 million 
students each year. 

0 The Preliminary Scholastic Aptitude Test/National Merit Scholarship 
Qualifying Test (PSAT/NMSQT)--a school -based test that measures verbal 
and mathematical reasoning abilities important for success in college, 
taken by more than 1.5 million high school sophomores and juniors each 
year. 

You may also be interested to know that the College Boara through its office 
in Puerto Rico sponsors the Prueba de Aptitud Academica (PAA), sometimes 
refei ed to as the "Spanish SAT." The PAA is taken by over 100,000 students 
throughout Puerto Rico, Latin America, and the mainiand United States. Not a 
translation of the SAT, the PAA is composed of items developed directly in 
Spanish; like the SAT, the PAA measures two essential types of reasoning: 
verbal and mathemutical . Along with the P. A, we also administer a battery of 
subject-matter achievement exams in Spanish. 



-2- 





126 



These and other standardized tests can be very useful in evaluating what 
students have learned. When properly developed, using the knowledge of 
teachers and other curriculun experts as well as surveys of appropriate 
curricului and course content » standardized tests can measure many of the 
important learning objectives that schools have for themselves and for their 
students, and do so validly, efficiently, and inexpensively. 

Finally, I should note modes of testing other than traditional 
paper-and-pencil multiple-choice tests. Clearly not all knowledge can be 
measured by these traditional tests. You may be interested to know that we 
are currently exploring a number of modifications to the SAT that include ♦Hj 
addition of open-response items in which students solve a problem and record 
their answers directly (that is, not via multiple-choice items), as well as a 
writing component (and score) that could include an essay or other direct 
measure of writing ability. These explorations also include the development 
of what we call "proficiency scaling," through which additional information 
will be generated about what particular scores on the SAT («n its verbal, 
mathematics, and writing components) mean in terms of what students are able 
to do. 

Perhaps the most promising news of all in efforts to measure what individual 
students know and are prepared to do is the development of computer-delivered 
tests. The College Board's first application of computerized adaptive testing 
has been in a series of tests of skills in college English and mathematics 
known as Computerized Placement Tests. The proaram is being expanded to 
include assessment of mathematics at higher skill levels. We are also 
Investigating other applications of computerized adaptive testing, including a 
battery of assessment tools and accompanying guidance materials for use with 
students at the middle school level and those with limited English proficiency. 

What is so encouraging about this kind of test is that it can utilize student 
responses to previous questions to select later questions in order to more 
accurately describe individual student's abilities and needs. It's almost a 
different test for each student, created by the student's own level cf ability 
and knowledge. These "ests will require much less time to take than paper and 
pencil tests, and will provide the option of immediate scoring and feedback to 
facilitate counseling, course placement, and other forms of advisement, and 
provide more useful diagnostic Information. 

The College Board is pleased to offer assistance in using existing tests 
and/or in developing additional ones. As I have tried to describe in this 
statement, the College Board has long experience in measuring higher order 
thinking skills, in using tests to inspire advanced levels of learning, and in 
setting common educational standards and goals through consensus-building 
activities. 

Thank you again for the opportunity to present this statement. We look 
forward to working with the Connittee on Education and Labor on these 
important educational issues. 



6439j/6014j 



O 





34-661 (132) 



